c****s 发帖数: 241 | 1 题目大概是这样:
有一个很长很长(长度不可知)的stream,现在需要从这个stream选取一个点,使每个
点被取的概率相同。问如何选取? | c**********e 发帖数: 2007 | 2 loop through the stream, for the n-th point (点),
exchange it with the selected one with probability 1/n. | s*****i 发帖数: 355 | 3 我记得wiki上有一个讲这个算法的,是从n个里面选k个。
谁还有link吗?谢谢!
【在 c**********e 的大作中提到】 : loop through the stream, for the n-th point (点), : exchange it with the selected one with probability 1/n.
| r****o 发帖数: 1950 | 4 只选取一个点?是不是若干个点?
【在 c****s 的大作中提到】 : 题目大概是这样: : 有一个很长很长(长度不可知)的stream,现在需要从这个stream选取一个点,使每个 : 点被取的概率相同。问如何选取?
| c**********e 发帖数: 2007 | 5 I do not remember the link. The approach is quite straight-forward.
Suppose we have a[1], a[2], ..., a[M], M is either unknown, or big,
or both. We need to select k of them with equal probability.
Select and store the first k record, store their subscript in an
interger array x[1]=1, x[2]=2, ..., x[k]=k.
Then for each record n=k+1, k+2, ..., we do the following to the x[]:
P(x[i]=n)=1/n for each i=1,..,k.
P(x[] no change) = 1-k/n.
In other words, with probability 1-k/n, we keep array x[]
【在 s*****i 的大作中提到】 : 我记得wiki上有一个讲这个算法的,是从n个里面选k个。 : 谁还有link吗?谢谢!
| c****s 发帖数: 241 | 6 我给你一个包子。:)
【在 c**********e 的大作中提到】 : I do not remember the link. The approach is quite straight-forward. : Suppose we have a[1], a[2], ..., a[M], M is either unknown, or big, : or both. We need to select k of them with equal probability. : Select and store the first k record, store their subscript in an : interger array x[1]=1, x[2]=2, ..., x[k]=k. : Then for each record n=k+1, k+2, ..., we do the following to the x[]: : P(x[i]=n)=1/n for each i=1,..,k. : P(x[] no change) = 1-k/n. : In other words, with probability 1-k/n, we keep array x[]
| h*******x 发帖数: 12808 | 7 赞,话说我当年在国内面试google,我就是挂在这个题目上。
【在 c**********e 的大作中提到】 : loop through the stream, for the n-th point (点), : exchange it with the selected one with probability 1/n.
|
|