由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - 文件分割的问题
相关主题
LinkedList 问题Re: JTextArea的内容不滚动
thread signaling 的一个问题Re: How to let two JFrame contact each other?
请问哪里能找到radixsort java code?Can't figure out: why can't append again
要随机返回一个Set的里的元素, 如何操作呢?Re: help with JTextArea
问一道关于Vector的题How to append something on a file?
怎样读取修改一个csv fileHow to delete an entry in JAR?
Why java.lang.Iterable depends on java.util.Iteratorhow to edit autoexec.bat
HashMap 怎样循环用更快?有个问题一直想问问
相关话题的讨论汇总
话题: file话题: 文件话题: sort话题: open话题: append
进入Java版参与讨论
1 (共1页)
j*******s
发帖数: 81
1
请教一个问题,有一个大文件,是个txt表格,按照第一列的关键字分割成若干文件。
比如
g*****g
发帖数: 34805
2
Do something in between, let's say you keep a "file pool",
you can open a maximum of 5000, and you keep the most recent 5000
open. Put it in a queue, pop the head out and append the new one
at the tail when it's over 5000. When you write a file and the file
is already in the queue, remove it and append it to the tail.
To speed up search, you can use a hashmap to track if the files are
open.

【在 j*******s 的大作中提到】
: 请教一个问题,有一个大文件,是个txt表格,按照第一列的关键字分割成若干文件。
: 比如

j*******s
发帖数: 81
3
好方法,多谢多谢,堆栈这个方法好极了。

【在 g*****g 的大作中提到】
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.

j*******s
发帖数: 81
4
用队列还是堆栈好?第一列的关键字是随机的,FIFO还是LIFO没区别吧?

【在 g*****g 的大作中提到】
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.

g*****g
发帖数: 34805
5
随机的话怎么都行,大部分实际问题应该先进先出,叫做least recently used.

【在 j*******s 的大作中提到】
: 用队列还是堆栈好?第一列的关键字是随机的,FIFO还是LIFO没区别吧?
s***e
发帖数: 122
6
既然是在mac os下操作,把这个大文件先sort一下不就简单多了吗?我甚至于会推荐更
进一步,直接用shell/python/perl任何一种脚本语言来写,肯定更容易一些。

【在 j*******s 的大作中提到】
: 请教一个问题,有一个大文件,是个txt表格,按照第一列的关键字分割成若干文件。
: 比如

b******y
发帖数: 9224
7

嗯,比较喜欢先sort的方法。好像比较有条不紊.

【在 s***e 的大作中提到】
: 既然是在mac os下操作,把这个大文件先sort一下不就简单多了吗?我甚至于会推荐更
: 进一步,直接用shell/python/perl任何一种脚本语言来写,肯定更容易一些。

F****n
发帖数: 3271
8
如果MEMORY能承受的话肯定是先SORT好,SORT的速度也就是NLOGN而已,比反复I/O要快
多了。

【在 b******y 的大作中提到】
:
: 嗯,比较喜欢先sort的方法。好像比较有条不紊.

A**o
发帖数: 1550
9
or keep all file names in memory,
and only write to 10k files each iteration reading through the raw file.

【在 g*****g 的大作中提到】
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.

1 (共1页)
进入Java版参与讨论
相关主题
有个问题一直想问问问一道关于Vector的题
[合集] a question about data structure怎样读取修改一个csv file
急问如何append double 数据到txt文件Why java.lang.Iterable depends on java.util.Iterator
请问StringBuffer的OutofMemory问题HashMap 怎样循环用更快?
LinkedList 问题Re: JTextArea的内容不滚动
thread signaling 的一个问题Re: How to let two JFrame contact each other?
请问哪里能找到radixsort java code?Can't figure out: why can't append again
要随机返回一个Set的里的元素, 如何操作呢?Re: help with JTextArea
相关话题的讨论汇总
话题: file话题: 文件话题: sort话题: open话题: append