o**s 发帖数: 65 | 1 一个文件的内容行数未知, 要一次扫描, 任意输出一行.
要求输出每一行的概率相等. 内存很小, 最多只能放下一行的内容.
如果可以扫描两次, 第一次得到行数, 很简单.
如果只扫描一次呢? | a*******8 发帖数: 2299 | 2 一个counter记录行数,每次取0-1随机数,
如果小于1/counter就记录当前行,否则skip当前行。 | x*****p 发帖数: 1707 | 3 Read first line, keep it in memory;
Read second line, keep it in memory with 1/2 chance
...
Read nth line, keep it in memory with 1/n chance.
Therefore, the chance that the k-th line is kept in the memory is
(1 - 1/2)(1 - 1/3)(1 - 1/4)...(1 - 1/k) = 1/k | s*****n 发帖数: 5488 | 4 这个是正解,就是英文没说清楚让我想了半天。
应该是。
内存中留一行作为buffer.
从第二行开始,以1/n概率replace内存中那一行。
【在 x*****p 的大作中提到】 : Read first line, keep it in memory; : Read second line, keep it in memory with 1/2 chance : ... : Read nth line, keep it in memory with 1/n chance. : Therefore, the chance that the k-th line is kept in the memory is : (1 - 1/2)(1 - 1/3)(1 - 1/4)...(1 - 1/k) = 1/k
| x******3 发帖数: 245 | 5
the chance that the k-th line is kept in the memory is
(1/k)(1-1/(k+1))(1-1/(k+2))...(1-1/n) = 1/n
so any line is chosen with the chance of 1/n
【在 x*****p 的大作中提到】 : Read first line, keep it in memory; : Read second line, keep it in memory with 1/2 chance : ... : Read nth line, keep it in memory with 1/n chance. : Therefore, the chance that the k-th line is kept in the memory is : (1 - 1/2)(1 - 1/3)(1 - 1/4)...(1 - 1/k) = 1/k
| j**l 发帖数: 2911 | | x****k 发帖数: 2932 | | x****k 发帖数: 2932 | 8 方法是对的,但公式可能不对,应该是
1/k *(1-1/(k+1))*(1-1/(k+2))*...*(1-1/(n)) = 1/n
对任意the k-nth line,the probability are all 1/n
【在 x*****p 的大作中提到】 : Read first line, keep it in memory; : Read second line, keep it in memory with 1/2 chance : ... : Read nth line, keep it in memory with 1/n chance. : Therefore, the chance that the k-th line is kept in the memory is : (1 - 1/2)(1 - 1/3)(1 - 1/4)...(1 - 1/k) = 1/k
|
|
|