t*****e 发帖数: 53 | 1 I know the solution:
read the first k lines from the file,
then repeat the following steps:
- read one line from the file
- with probability X, keep the new line, and randomly drop a line from the
previous selected k lines.
- with probability (1-X), drop the new line.
till all the lines of the lines are read
My questions is:
- What should be the value of X?
- How to give a strict math proof that this method gives a randomly
uniformly distributed k lines.
thanks alot for your help. | l********8 发帖数: 83 | 2 CareerCup 150题里有道要求从一串数字流里随机挑一个数。这个应该也是用一样的方
法吧。 | a*******y 发帖数: 1040 | 3 wiki reservoir sampling
basically you have to have a random() function to generate from 1 to i where
i is the count of the line you have seen so far, if it fall between 1 and k
, you replace that one, otherwise, drop this one |
|