D**f 发帖数: 439 | 1 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
。如何找到N^2个数的中数(median)?
原载于下面的链接:
http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢? |
N4 发帖数: 155 | 2 49匹马
【在 D**f 的大作中提到】 : 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作 : 。如何找到N^2个数的中数(median)? : 原载于下面的链接: : http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid : 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
|
N**n 发帖数: 832 | 3 median of median
【在 D**f 的大作中提到】 : 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作 : 。如何找到N^2个数的中数(median)? : 原载于下面的链接: : http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid : 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
|
D**f 发帖数: 439 | |
b*******d 发帖数: 750 | 5 two ways:
1. sort and N-way merge to get the N^2/2th number
2. sort locally, and use binary search to search whole integer value space.
with up to 32 times of steps, you can locate that number:
a. start from a = 2^31,
b. send a to each local machine.
c. each machine reports to master how many numbers (M_i) are less than a.
d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
and decide the next number to test.
e. go to a. with next testing number. |
N**n 发帖数: 832 | 6 找中位数从来没需要sort过,sort复杂度太高
google搜selection algorithm和median of median
.
,
【在 b*******d 的大作中提到】 : two ways: : 1. sort and N-way merge to get the N^2/2th number : 2. sort locally, and use binary search to search whole integer value space. : with up to 32 times of steps, you can locate that number: : a. start from a = 2^31, : b. send a to each local machine. : c. each machine reports to master how many numbers (M_i) are less than a. : d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger, : and decide the next number to test. : e. go to a. with next testing number.
|
b*******d 发帖数: 750 | 7
~~~~
sure, so you don't sort, just iterate each local file when testing.
totally it will be still linear time (with up to 32 iterations of files
though, big constant factor), and no touching on original data.
actually it's all integer numbers, i think even sorting could be done in
linear time (radix sort), and it will speed up testing a lot.
Not sure why median of median works:
let N = 5,
Machine1: 1 2 3 4 5 median 3
Machine2: 101 102 103 104 105 median 103
Machine3: 99 100 104 106 107 median 104
median of median: 103
1 2 3 4 5 99 100 101 102 < 103, totally 9 numbers.
104 104 105 106 107 > 103, totally 5 numbers.
so 103 is not median globally I guess?
【在 N**n 的大作中提到】 : 找中位数从来没需要sort过,sort复杂度太高 : google搜selection algorithm和median of median : : . : ,
|
i*********7 发帖数: 348 | 8 median of median. 把每台机器上的N个数的Median找出来。
这个时候你就会有n个median。然后再放到一台机器上找median。
CLRS上有一节是讲这个算法的。这属于stable O(n)的算法,只是这个n有点大。。。 |
i*********7 发帖数: 348 | 9
我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
你可以找个例子试试
【在 b*******d 的大作中提到】 : : ~~~~ : sure, so you don't sort, just iterate each local file when testing. : totally it will be still linear time (with up to 32 iterations of files : though, big constant factor), and no touching on original data. : actually it's all integer numbers, i think even sorting could be done in : linear time (radix sort), and it will speed up testing a lot. : Not sure why median of median works: : let N = 5, : Machine1: 1 2 3 4 5 median 3
|
b*******d 发帖数: 750 | 10
hmmm...
what about:
machine 1: 1 2 3 4 5 median: 3
machine 2: 1 2 3 4 5 median: 3
machine 3: 1 2 3 4 5 median: 3
machine 4: 101 102 103 104 105 median: 103
machine 5: 101 102 103 104 105 median: 103
median of median: 3, but it's obviously not the global median, only 6
elements are less than "3"...
if you guys are talking about this kind of http://www.ics.uci.edu/~eppstein/161/960130.html
i guess there is a recursion step you guys missed.. ???
【在 i*********7 的大作中提到】 : : 我印象里Median of median只适用于N * N的矩阵,就像这一题一样。 : 你可以找个例子试试
|
|
|
f*****n 发帖数: 15 | 11 we can do something similar to find the median of an array by using quick
sort.
1. randomly pick up a number as a pivot from the N^2 numbers
2. split each array into two groups, smaller than the pivot and greater than
the pivot
3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
pivot is the median
3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
and try to find the (N^2/2 - count) smallest number in the left numbers
3.2 if the count is larger than N^2/2, throw all numbers larger than the
pivot and try to find the N^2/2 smallest number in the left numbers |
D**f 发帖数: 439 | 12 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
。如何找到N^2个数的中数(median)?
原载于下面的链接:
http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢? |
N4 发帖数: 155 | 13 49匹马
【在 D**f 的大作中提到】 : 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作 : 。如何找到N^2个数的中数(median)? : 原载于下面的链接: : http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid : 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
|
N**n 发帖数: 832 | 14 median of median
【在 D**f 的大作中提到】 : 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作 : 。如何找到N^2个数的中数(median)? : 原载于下面的链接: : http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid : 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
|
D**f 发帖数: 439 | |
b*******d 发帖数: 750 | 16 two ways:
1. sort and N-way merge to get the N^2/2th number
2. sort locally, and use binary search to search whole integer value space.
with up to 32 times of steps, you can locate that number:
a. start from a = 2^31,
b. send a to each local machine.
c. each machine reports to master how many numbers (M_i) are less than a.
d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
and decide the next number to test.
e. go to a. with next testing number. |
N**n 发帖数: 832 | 17 找中位数从来没需要sort过,sort复杂度太高
google搜selection algorithm和median of median
.
,
【在 b*******d 的大作中提到】 : two ways: : 1. sort and N-way merge to get the N^2/2th number : 2. sort locally, and use binary search to search whole integer value space. : with up to 32 times of steps, you can locate that number: : a. start from a = 2^31, : b. send a to each local machine. : c. each machine reports to master how many numbers (M_i) are less than a. : d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger, : and decide the next number to test. : e. go to a. with next testing number.
|
b*******d 发帖数: 750 | 18
~~~~
sure, so you don't sort, just iterate each local file when testing.
totally it will be still linear time (with up to 32 iterations of files
though, big constant factor), and no touching on original data.
actually it's all integer numbers, i think even sorting could be done in
linear time (radix sort), and it will speed up testing a lot.
Not sure why median of median works:
let N = 5,
Machine1: 1 2 3 4 5 median 3
Machine2: 101 102 103 104 105 median 103
Machine3: 99 100 104 106 107 median 104
median of median: 103
1 2 3 4 5 99 100 101 102 < 103, totally 9 numbers.
104 104 105 106 107 > 103, totally 5 numbers.
so 103 is not median globally I guess?
【在 N**n 的大作中提到】 : 找中位数从来没需要sort过,sort复杂度太高 : google搜selection algorithm和median of median : : . : ,
|
i*********7 发帖数: 348 | 19 median of median. 把每台机器上的N个数的Median找出来。
这个时候你就会有n个median。然后再放到一台机器上找median。
CLRS上有一节是讲这个算法的。这属于stable O(n)的算法,只是这个n有点大。。。 |
i*********7 发帖数: 348 | 20
我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
你可以找个例子试试
【在 b*******d 的大作中提到】 : : ~~~~ : sure, so you don't sort, just iterate each local file when testing. : totally it will be still linear time (with up to 32 iterations of files : though, big constant factor), and no touching on original data. : actually it's all integer numbers, i think even sorting could be done in : linear time (radix sort), and it will speed up testing a lot. : Not sure why median of median works: : let N = 5, : Machine1: 1 2 3 4 5 median 3
|
|
|
b*******d 发帖数: 750 | 21
hmmm...
what about:
machine 1: 1 2 3 4 5 median: 3
machine 2: 1 2 3 4 5 median: 3
machine 3: 1 2 3 4 5 median: 3
machine 4: 101 102 103 104 105 median: 103
machine 5: 101 102 103 104 105 median: 103
median of median: 3, but it's obviously not the global median, only 6
elements are less than "3"...
if you guys are talking about this kind of http://www.ics.uci.edu/~eppstein/161/960130.html
i guess there is a recursion step you guys missed.. ???
【在 i*********7 的大作中提到】 : : 我印象里Median of median只适用于N * N的矩阵,就像这一题一样。 : 你可以找个例子试试
|
f*****n 发帖数: 15 | 22 we can do something similar to find the median of an array by using quick
sort.
1. randomly pick up a number as a pivot from the N^2 numbers
2. split each array into two groups, smaller than the pivot and greater than
the pivot
3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
pivot is the median
3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
and try to find the (N^2/2 - count) smallest number in the left numbers
3.2 if the count is larger than N^2/2, throw all numbers larger than the
pivot and try to find the N^2/2 smallest number in the left numbers |
a**********e 发帖数: 157 | 23
than
the
pivot
【在 f*****n 的大作中提到】 : we can do something similar to find the median of an array by using quick : sort. : 1. randomly pick up a number as a pivot from the N^2 numbers : 2. split each array into two groups, smaller than the pivot and greater than : the pivot : 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the : pivot is the median : 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot : and try to find the (N^2/2 - count) smallest number in the left numbers : 3.2 if the count is larger than N^2/2, throw all numbers larger than the
|
a**********e 发帖数: 157 | 24 这种方法比较合适
than
the
pivot
【在 f*****n 的大作中提到】 : we can do something similar to find the median of an array by using quick : sort. : 1. randomly pick up a number as a pivot from the N^2 numbers : 2. split each array into two groups, smaller than the pivot and greater than : the pivot : 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the : pivot is the median : 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot : and try to find the (N^2/2 - count) smallest number in the left numbers : 3.2 if the count is larger than N^2/2, throw all numbers larger than the
|
l*******b 发帖数: 2586 | 25 嗯,貌似和存在一台机器里没有太大区别呀
than
the
pivot
【在 f*****n 的大作中提到】 : we can do something similar to find the median of an array by using quick : sort. : 1. randomly pick up a number as a pivot from the N^2 numbers : 2. split each array into two groups, smaller than the pivot and greater than : the pivot : 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the : pivot is the median : 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot : and try to find the (N^2/2 - count) smallest number in the left numbers : 3.2 if the count is larger than N^2/2, throw all numbers larger than the
|
k****r 发帖数: 807 | 26 这种throw away实现起来怎么办?别说还要申请更多的空间存下你想要的,这样肯定不
是interviewer想要的答案
【在 a**********e 的大作中提到】 : 这种方法比较合适 : : than : the : pivot
|
l*******b 发帖数: 2586 | 27 swap下,比这个数大的放后面,比这个小的放前面,快速排序那样
【在 k****r 的大作中提到】 : 这种throw away实现起来怎么办?别说还要申请更多的空间存下你想要的,这样肯定不 : 是interviewer想要的答案
|
k****r 发帖数: 807 | 28 make sense,
是不是可以纪录一下total_min, total_max, then the next compared value can be
calculated.
【在 l*******b 的大作中提到】 : swap下,比这个数大的放后面,比这个小的放前面,快速排序那样
|
l*******b 发帖数: 2586 | 29 http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general
和total min total max关系不太大吧,随机选取一个元素就行了。更复杂的办法貌似
号称可以总取到比较好的元素,没太看懂。。。
be
【在 k****r 的大作中提到】 : make sense, : 是不是可以纪录一下total_min, total_max, then the next compared value can be : calculated.
|