d**s 发帖数: 920 | 1 Went to Adobe to interview a Senior SW Engineer position,
总的interview的不错, 但被下面问题问倒了,让回去想想,
Q1:
"We need to compare thousands text files with each other, they are not big,
less than 100K each. They are in a directories tree, with a few levels of
subdirectories, how to speed up the comparing process ?"
My answers: We can read them all of these files into memory once so that we
can reduce the number of diso I/O.
[Feedback: That is a good appoach].
Q2: How to read these files into memory (on MS Wi | a**********s 发帖数: 588 | | s*******i 发帖数: 712 | 3 说说Q3
我觉得关键是频繁从磁盘里读取小文件不仅I/O慢,而且浪费了磁盘的带宽。改进这个
瓶颈可以通过
增强磁盘上数据的locality和充分利用磁盘带宽。
1. 利用磁盘带宽:把磁盘划分为某个合适大小的区域(如128KB),称为cluster。
cluster的大
小决定于磁盘带宽,能充分利用一次读入的吞吐量。
2. 提供一个locality算法把相关性高的小文件尽量归到一个cluster里。
这样从磁盘读取文件时以cluster为单位,这样既充分利用了磁盘带宽,还由于
locality,在内存
里处理该文件后,接下来要处理的文件很有可能就在你读入的cluster里了。减少I/O次
数。
Q1。 可能也是类似的情况,具体怎么弄我也不知道。但你说一次性读入所有文件未必
合适。这些文
件加起来有好几百M了吧。
Q2. 是不是和文件和目录在磁盘上的分布有关系?谁来说说文件和目录在磁盘中怎么放
的?有啥规律
吗?
big,
of
that we
how do
【在 d**s 的大作中提到】 : Went to Adobe to interview a Senior SW Engineer position, : 总的interview的不错, 但被下面问题问倒了,让回去想想, : Q1: : "We need to compare thousands text files with each other, they are not big, : less than 100K each. They are in a directories tree, with a few levels of : subdirectories, how to speed up the comparing process ?" : My answers: We can read them all of these files into memory once so that we : can reduce the number of diso I/O. : [Feedback: That is a good appoach]. : Q2: How to read these files into memory (on MS Wi
| d**s 发帖数: 920 | 4 Thanks, actually, they just want to read all files into memory, because
totally only a few hundred M bytes in memory, so that is not an issue.
【在 s*******i 的大作中提到】 : 说说Q3 : 我觉得关键是频繁从磁盘里读取小文件不仅I/O慢,而且浪费了磁盘的带宽。改进这个 : 瓶颈可以通过 : 增强磁盘上数据的locality和充分利用磁盘带宽。 : 1. 利用磁盘带宽:把磁盘划分为某个合适大小的区域(如128KB),称为cluster。 : cluster的大 : 小决定于磁盘带宽,能充分利用一次读入的吞吐量。 : 2. 提供一个locality算法把相关性高的小文件尽量归到一个cluster里。 : 这样从磁盘读取文件时以cluster为单位,这样既充分利用了磁盘带宽,还由于 : locality,在内存
| y*r 发帖数: 590 | 5 Q2, it seems a tree is not bad here ?
Really have no idea about Q3 .... someone got a good iea?
,
we
【在 d**s 的大作中提到】 : Went to Adobe to interview a Senior SW Engineer position, : 总的interview的不错, 但被下面问题问倒了,让回去想想, : Q1: : "We need to compare thousands text files with each other, they are not big, : less than 100K each. They are in a directories tree, with a few levels of : subdirectories, how to speed up the comparing process ?" : My answers: We can read them all of these files into memory once so that we : can reduce the number of diso I/O. : [Feedback: That is a good appoach]. : Q2: How to read these files into memory (on MS Wi
| s******8 发帖数: 4192 | |
|