p*****o 发帖数: 1285 | 1 某家on-site的一道题。
Find the most frequent character in a string.
Data: strings are composed of Unicode charaters, stored in 10 sets of 5GB
files on 10 different servers each with 2GB of memory.
Constraints: network communication is expensive.
Question: find the best algorithm. | m*****g 发帖数: 226 | | Z*****Z 发帖数: 723 | 3 MapReduce的思路吧,10台机器,每个负责数一个区间内的数。
基本算法就是每个机器读自己的文件,然后发送每个字符到相应的机器上。
要是想save network communication cost,可以每台机器把要发送的字符cache起来,
这样每个机器生成10个文件,然后把其中的9个发送到相应的主机上去。
【在 p*****o 的大作中提到】 : 某家on-site的一道题。 : Find the most frequent character in a string. : Data: strings are composed of Unicode charaters, stored in 10 sets of 5GB : files on 10 different servers each with 2GB of memory. : Constraints: network communication is expensive. : Question: find the best algorithm.
|
|