有一个问题想问问大家,谢谢了。
You have a 200 GB text file and a Linux box with 8GB of RAM and 4 cores.
Write a program/script that outputs a file listing the frequency of all
words in the file (i.e. a TSV file with two columns ). Note
that the set of words in the file may not fit in memory.
f**********t 发帖数: 1001
2
mlock 8GB as buffer;
4 threads: 1st process 0-2G buffer; 2nd process 2-4G buffer; 3rd 4-6G .. and
produce their own unorderded_maps.
mmap 8GB file each time into memory.
merge unordered_maps.
Note
【在 s******s 的大作中提到】 : 有一个问题想问问大家,谢谢了。 : You have a 200 GB text file and a Linux box with 8GB of RAM and 4 cores. : Write a program/script that outputs a file listing the frequency of all : words in the file (i.e. a TSV file with two columns ). Note : that the set of words in the file may not fit in memory.