m********a 发帖数: 128 | 1 如果给你big file, size much greater than memory,
in the file, each line contains
userId, pageIdVistedByUser
How do you generate the list of pageIds for each user?
must do it on one machine. not map reduce. | l*****a 发帖数: 14598 | 2 这个也能叫算法题?
【在 m********a 的大作中提到】 : 如果给你big file, size much greater than memory, : in the file, each line contains : userId, pageIdVistedByUser : How do you generate the list of pageIds for each user? : must do it on one machine. not map reduce.
| m********a 发帖数: 128 | 3 是啊,面试官是这么说的。。。
【在 l*****a 的大作中提到】 : 这个也能叫算法题?
| m********a 发帖数: 128 | 4 如果给你big file, size much greater than memory,
in the file, each line contains
userId, pageIdVistedByUser
it's not sorted, userId can occur multiple times in the file. How do you
generate the list of visitedPageIds for each user?
must do it on one machine. not map reduce. | l*****a 发帖数: 14598 | 5 这个也能叫算法题?
【在 m********a 的大作中提到】 : 如果给你big file, size much greater than memory, : in the file, each line contains : userId, pageIdVistedByUser : it's not sorted, userId can occur multiple times in the file. How do you : generate the list of visitedPageIds for each user? : must do it on one machine. not map reduce.
| m********a 发帖数: 128 | 6 是啊,面试官是这么说的。。。
【在 l*****a 的大作中提到】 : 这个也能叫算法题?
| m*****7 发帖数: 4 | 7 用(userID%n)来把所有的record分别存到n个文件里,这样同样的userID肯定在同一个
文件里,再分别处理每个文件就好了 | s*w 发帖数: 729 | 8 这个没涉及算法啊
直接读文件,每个 userId 直接开个同名文件append pageID 进去
while read line
do
userID=$(echo "$line" | cut -d, -f1)
pageIDVistedByUser=$(echo "$line" | cut -d, -f2)
echo $pageIDVistedByUser >> userID
done < BigFile
【在 m********a 的大作中提到】 : 如果给你big file, size much greater than memory, : in the file, each line contains : userId, pageIdVistedByUser : it's not sorted, userId can occur multiple times in the file. How do you : generate the list of visitedPageIds for each user? : must do it on one machine. not map reduce.
| y****i 发帖数: 4109 | 9 请问这种题目是面试哪种职位啊,楼上倒数第二行的userID应该前面加个$ |
|