文件分割的问题 - Java版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Java版 - 文件分割的问题

相关主题
● LinkedList 问题	● Re: JTextArea的内容不滚动
● thread signaling 的一个问题	● Re: How to let two JFrame contact each other?
● 请问哪里能找到radixsort java code?	● Can't figure out: why can't append again
● 要随机返回一个Set的里的元素，如何操作呢？	● Re: help with JTextArea
● 问一道关于Vector的题	● How to append something on a file?
● 怎样读取修改一个csv file	● How to delete an entry in JAR?
● Why java.lang.Iterable depends on java.util.Iterator	● how to edit autoexec.bat
● HashMap 怎样循环用更快？	● 有个问题一直想问问

相关话题的讨论汇总
话题: file话题: 文件话题: sort话题: open话题: append

进入Java版参与讨论

1

(共1页)

j*******s 发帖数: 81	1 请教一个问题，有一个大文件，是个txt表格，按照第一列的关键字分割成若干文件。比如
g*****g 发帖数: 34805	2 Do something in between, let's say you keep a "file pool", you can open a maximum of 5000, and you keep the most recent 5000 open. Put it in a queue, pop the head out and append the new one at the tail when it's over 5000. When you write a file and the file is already in the queue, remove it and append it to the tail. To speed up search, you can use a hashmap to track if the files are open. 【在 j*******s 的大作中提到】 : 请教一个问题，有一个大文件，是个txt表格，按照第一列的关键字分割成若干文件。 : 比如
j*******s 发帖数: 81	3 好方法，多谢多谢，堆栈这个方法好极了。【在 g*****g 的大作中提到】 : Do something in between, let's say you keep a "file pool", : you can open a maximum of 5000, and you keep the most recent 5000 : open. Put it in a queue, pop the head out and append the new one : at the tail when it's over 5000. When you write a file and the file : is already in the queue, remove it and append it to the tail. : To speed up search, you can use a hashmap to track if the files are : open.
j*******s 发帖数: 81	4 用队列还是堆栈好？第一列的关键字是随机的，FIFO还是LIFO没区别吧？【在 g*****g 的大作中提到】 : Do something in between, let's say you keep a "file pool", : you can open a maximum of 5000, and you keep the most recent 5000 : open. Put it in a queue, pop the head out and append the new one : at the tail when it's over 5000. When you write a file and the file : is already in the queue, remove it and append it to the tail. : To speed up search, you can use a hashmap to track if the files are : open.
g*****g 发帖数: 34805	5 随机的话怎么都行，大部分实际问题应该先进先出，叫做least recently used. 【在 j*******s 的大作中提到】 : 用队列还是堆栈好？第一列的关键字是随机的，FIFO还是LIFO没区别吧？
s***e 发帖数: 122	6 既然是在mac os下操作，把这个大文件先sort一下不就简单多了吗？我甚至于会推荐更进一步，直接用shell/python/perl任何一种脚本语言来写，肯定更容易一些。【在 j*******s 的大作中提到】 : 请教一个问题，有一个大文件，是个txt表格，按照第一列的关键字分割成若干文件。 : 比如
b******y 发帖数: 9224	7 嗯，比较喜欢先sort的方法。好像比较有条不紊. 【在 s***e 的大作中提到】 : 既然是在mac os下操作，把这个大文件先sort一下不就简单多了吗？我甚至于会推荐更 : 进一步，直接用shell/python/perl任何一种脚本语言来写，肯定更容易一些。
F****n 发帖数: 3271	8 如果MEMORY能承受的话肯定是先SORT好，SORT的速度也就是NLOGN而已，比反复I/O要快多了。【在 b******y 的大作中提到】 : : 嗯，比较喜欢先sort的方法。好像比较有条不紊.
A**o 发帖数: 1550	9 or keep all file names in memory, and only write to 10k files each iteration reading through the raw file. 【在 g*****g 的大作中提到】 : Do something in between, let's say you keep a "file pool", : you can open a maximum of 5000, and you keep the most recent 5000 : open. Put it in a queue, pop the head out and append the new one : at the tail when it's over 5000. When you write a file and the file : is already in the queue, remove it and append it to the tail. : To speed up search, you can use a hashmap to track if the files are : open.

1

(共1页)

进入Java版参与讨论

相关主题
● 有个问题一直想问问	● 问一道关于Vector的题
● [合集] a question about data structure	● 怎样读取修改一个csv file
● 急问如何append double 数据到txt文件	● Why java.lang.Iterable depends on java.util.Iterator
● 请问StringBuffer的OutofMemory问题	● HashMap 怎样循环用更快？
● LinkedList 问题	● Re: JTextArea的内容不滚动
● thread signaling 的一个问题	● Re: How to let two JFrame contact each other?
● 请问哪里能找到radixsort java code?	● Can't figure out: why can't append again
● 要随机返回一个Set的里的元素，如何操作呢？	● Re: help with JTextArea

相关话题的讨论汇总
话题: file话题: 文件话题: sort话题: open话题: append

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)