由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 这个怎么解:找到N^2个数的中数
相关主题
问一个关于找中数得问题究竟什么定义了DP
找median有O(N)的算法吗?问个白痴问题,DP到底算不算递归?
在 1 billion 的数中找 median问道大数据的题
请问怎样写没有parent pointer的BST iterator?被Facebook的面试的一道题目难倒了
海量数据找中数似乎没啥好办法。一道小题
LC: 两个排序数组找中数继续贴几个题目
find median for k sorted arrays这题有解吗?
问一道google的题bloomberg刚店面晚。 悔阿
相关话题的讨论汇总
话题: median话题: numbers话题: pivot话题: 103话题: than
进入JobHunting版参与讨论
1 (共1页)
D**f
发帖数: 439
1
3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
。如何找到N^2个数的中数(median)?
原载于下面的链接:
http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
N4
发帖数: 155
2
49匹马

【在 D**f 的大作中提到】
: 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
: 。如何找到N^2个数的中数(median)?
: 原载于下面的链接:
: http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
: 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?

N**n
发帖数: 832
3
median of median

【在 D**f 的大作中提到】
: 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
: 。如何找到N^2个数的中数(median)?
: 原载于下面的链接:
: http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
: 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?

D**f
发帖数: 439
4
都别打哑谜了,具体说说?
b*******d
发帖数: 750
5
two ways:
1. sort and N-way merge to get the N^2/2th number
2. sort locally, and use binary search to search whole integer value space.
with up to 32 times of steps, you can locate that number:
a. start from a = 2^31,
b. send a to each local machine.
c. each machine reports to master how many numbers (M_i) are less than a.
d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
and decide the next number to test.
e. go to a. with next testing number.
N**n
发帖数: 832
6
找中位数从来没需要sort过,sort复杂度太高
google搜selection algorithm和median of median

.
,

【在 b*******d 的大作中提到】
: two ways:
: 1. sort and N-way merge to get the N^2/2th number
: 2. sort locally, and use binary search to search whole integer value space.
: with up to 32 times of steps, you can locate that number:
: a. start from a = 2^31,
: b. send a to each local machine.
: c. each machine reports to master how many numbers (M_i) are less than a.
: d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
: and decide the next number to test.
: e. go to a. with next testing number.

b*******d
发帖数: 750
7

~~~~
sure, so you don't sort, just iterate each local file when testing.
totally it will be still linear time (with up to 32 iterations of files
though, big constant factor), and no touching on original data.
actually it's all integer numbers, i think even sorting could be done in
linear time (radix sort), and it will speed up testing a lot.
Not sure why median of median works:
let N = 5,
Machine1: 1 2 3 4 5 median 3
Machine2: 101 102 103 104 105 median 103
Machine3: 99 100 104 106 107 median 104
median of median: 103
1 2 3 4 5 99 100 101 102 < 103, totally 9 numbers.
104 104 105 106 107 > 103, totally 5 numbers.
so 103 is not median globally I guess?

【在 N**n 的大作中提到】
: 找中位数从来没需要sort过,sort复杂度太高
: google搜selection algorithm和median of median
:
: .
: ,

i*********7
发帖数: 348
8
median of median. 把每台机器上的N个数的Median找出来。
这个时候你就会有n个median。然后再放到一台机器上找median。
CLRS上有一节是讲这个算法的。这属于stable O(n)的算法,只是这个n有点大。。。
i*********7
发帖数: 348
9

我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
你可以找个例子试试

【在 b*******d 的大作中提到】
:
: ~~~~
: sure, so you don't sort, just iterate each local file when testing.
: totally it will be still linear time (with up to 32 iterations of files
: though, big constant factor), and no touching on original data.
: actually it's all integer numbers, i think even sorting could be done in
: linear time (radix sort), and it will speed up testing a lot.
: Not sure why median of median works:
: let N = 5,
: Machine1: 1 2 3 4 5 median 3

b*******d
发帖数: 750
10

hmmm...
what about:
machine 1: 1 2 3 4 5 median: 3
machine 2: 1 2 3 4 5 median: 3
machine 3: 1 2 3 4 5 median: 3
machine 4: 101 102 103 104 105 median: 103
machine 5: 101 102 103 104 105 median: 103
median of median: 3, but it's obviously not the global median, only 6
elements are less than "3"...
if you guys are talking about this kind of http://www.ics.uci.edu/~eppstein/161/960130.html
i guess there is a recursion step you guys missed.. ???

【在 i*********7 的大作中提到】
:
: 我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
: 你可以找个例子试试

相关主题
LC: 两个排序数组找中数究竟什么定义了DP
find median for k sorted arrays问个白痴问题,DP到底算不算递归?
问一道google的题问道大数据的题
进入JobHunting版参与讨论
f*****n
发帖数: 15
11
we can do something similar to find the median of an array by using quick
sort.
1. randomly pick up a number as a pivot from the N^2 numbers
2. split each array into two groups, smaller than the pivot and greater than
the pivot
3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
pivot is the median
3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
and try to find the (N^2/2 - count) smallest number in the left numbers
3.2 if the count is larger than N^2/2, throw all numbers larger than the
pivot and try to find the N^2/2 smallest number in the left numbers
D**f
发帖数: 439
12
3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
。如何找到N^2个数的中数(median)?
原载于下面的链接:
http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?
N4
发帖数: 155
13
49匹马

【在 D**f 的大作中提到】
: 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
: 。如何找到N^2个数的中数(median)?
: 原载于下面的链接:
: http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
: 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?

N**n
发帖数: 832
14
median of median

【在 D**f 的大作中提到】
: 3).一共有N个机器,每个机器上有N个数。每个机器最多存O(N)个数并对它们操作
: 。如何找到N^2个数的中数(median)?
: 原载于下面的链接:
: http://www.mitbbs.com/mitbbs_article_t.php?board=JobHunting&gid
: 我的想法是找到每个机器上的中位数,但全体的中位数如何从中得出呢?

D**f
发帖数: 439
15
都别打哑谜了,具体说说?
b*******d
发帖数: 750
16
two ways:
1. sort and N-way merge to get the N^2/2th number
2. sort locally, and use binary search to search whole integer value space.
with up to 32 times of steps, you can locate that number:
a. start from a = 2^31,
b. send a to each local machine.
c. each machine reports to master how many numbers (M_i) are less than a.
d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
and decide the next number to test.
e. go to a. with next testing number.
N**n
发帖数: 832
17
找中位数从来没需要sort过,sort复杂度太高
google搜selection algorithm和median of median

.
,

【在 b*******d 的大作中提到】
: two ways:
: 1. sort and N-way merge to get the N^2/2th number
: 2. sort locally, and use binary search to search whole integer value space.
: with up to 32 times of steps, you can locate that number:
: a. start from a = 2^31,
: b. send a to each local machine.
: c. each machine reports to master how many numbers (M_i) are less than a.
: d. master sum up the Mi, and see if sum(M_i) is less than N^2/ 2 or larger,
: and decide the next number to test.
: e. go to a. with next testing number.

b*******d
发帖数: 750
18

~~~~
sure, so you don't sort, just iterate each local file when testing.
totally it will be still linear time (with up to 32 iterations of files
though, big constant factor), and no touching on original data.
actually it's all integer numbers, i think even sorting could be done in
linear time (radix sort), and it will speed up testing a lot.
Not sure why median of median works:
let N = 5,
Machine1: 1 2 3 4 5 median 3
Machine2: 101 102 103 104 105 median 103
Machine3: 99 100 104 106 107 median 104
median of median: 103
1 2 3 4 5 99 100 101 102 < 103, totally 9 numbers.
104 104 105 106 107 > 103, totally 5 numbers.
so 103 is not median globally I guess?

【在 N**n 的大作中提到】
: 找中位数从来没需要sort过,sort复杂度太高
: google搜selection algorithm和median of median
:
: .
: ,

i*********7
发帖数: 348
19
median of median. 把每台机器上的N个数的Median找出来。
这个时候你就会有n个median。然后再放到一台机器上找median。
CLRS上有一节是讲这个算法的。这属于stable O(n)的算法,只是这个n有点大。。。
i*********7
发帖数: 348
20

我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
你可以找个例子试试

【在 b*******d 的大作中提到】
:
: ~~~~
: sure, so you don't sort, just iterate each local file when testing.
: totally it will be still linear time (with up to 32 iterations of files
: though, big constant factor), and no touching on original data.
: actually it's all integer numbers, i think even sorting could be done in
: linear time (radix sort), and it will speed up testing a lot.
: Not sure why median of median works:
: let N = 5,
: Machine1: 1 2 3 4 5 median 3

相关主题
被Facebook的面试的一道题目难倒了这题有解吗?
一道小题bloomberg刚店面晚。 悔阿
继续贴几个题目其实我很想知道, 多少软工能45分钟内把quicksort写下来
进入JobHunting版参与讨论
b*******d
发帖数: 750
21

hmmm...
what about:
machine 1: 1 2 3 4 5 median: 3
machine 2: 1 2 3 4 5 median: 3
machine 3: 1 2 3 4 5 median: 3
machine 4: 101 102 103 104 105 median: 103
machine 5: 101 102 103 104 105 median: 103
median of median: 3, but it's obviously not the global median, only 6
elements are less than "3"...
if you guys are talking about this kind of http://www.ics.uci.edu/~eppstein/161/960130.html
i guess there is a recursion step you guys missed.. ???

【在 i*********7 的大作中提到】
:
: 我印象里Median of median只适用于N * N的矩阵,就像这一题一样。
: 你可以找个例子试试

f*****n
发帖数: 15
22
we can do something similar to find the median of an array by using quick
sort.
1. randomly pick up a number as a pivot from the N^2 numbers
2. split each array into two groups, smaller than the pivot and greater than
the pivot
3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
pivot is the median
3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
and try to find the (N^2/2 - count) smallest number in the left numbers
3.2 if the count is larger than N^2/2, throw all numbers larger than the
pivot and try to find the N^2/2 smallest number in the left numbers
a**********e
发帖数: 157
23

than
the
pivot

【在 f*****n 的大作中提到】
: we can do something similar to find the median of an array by using quick
: sort.
: 1. randomly pick up a number as a pivot from the N^2 numbers
: 2. split each array into two groups, smaller than the pivot and greater than
: the pivot
: 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
: pivot is the median
: 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
: and try to find the (N^2/2 - count) smallest number in the left numbers
: 3.2 if the count is larger than N^2/2, throw all numbers larger than the

a**********e
发帖数: 157
24
这种方法比较合适

than
the
pivot

【在 f*****n 的大作中提到】
: we can do something similar to find the median of an array by using quick
: sort.
: 1. randomly pick up a number as a pivot from the N^2 numbers
: 2. split each array into two groups, smaller than the pivot and greater than
: the pivot
: 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
: pivot is the median
: 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
: and try to find the (N^2/2 - count) smallest number in the left numbers
: 3.2 if the count is larger than N^2/2, throw all numbers larger than the

l*******b
发帖数: 2586
25
嗯,貌似和存在一台机器里没有太大区别呀

than
the
pivot

【在 f*****n 的大作中提到】
: we can do something similar to find the median of an array by using quick
: sort.
: 1. randomly pick up a number as a pivot from the N^2 numbers
: 2. split each array into two groups, smaller than the pivot and greater than
: the pivot
: 3. if the count of numbers smaller than or equal to the pivot is N^2/2, the
: pivot is the median
: 3.1 if the count is smaller than N^2/2, throw all numbers smaller than pivot
: and try to find the (N^2/2 - count) smallest number in the left numbers
: 3.2 if the count is larger than N^2/2, throw all numbers larger than the

k****r
发帖数: 807
26
这种throw away实现起来怎么办?别说还要申请更多的空间存下你想要的,这样肯定不
是interviewer想要的答案

【在 a**********e 的大作中提到】
: 这种方法比较合适
:
: than
: the
: pivot

l*******b
发帖数: 2586
27
swap下,比这个数大的放后面,比这个小的放前面,快速排序那样

【在 k****r 的大作中提到】
: 这种throw away实现起来怎么办?别说还要申请更多的空间存下你想要的,这样肯定不
: 是interviewer想要的答案

k****r
发帖数: 807
28
make sense,
是不是可以纪录一下total_min, total_max, then the next compared value can be
calculated.

【在 l*******b 的大作中提到】
: swap下,比这个数大的放后面,比这个小的放前面,快速排序那样
l*******b
发帖数: 2586
29
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general
和total min total max关系不太大吧,随机选取一个元素就行了。更复杂的办法貌似
号称可以总取到比较好的元素,没太看懂。。。

be

【在 k****r 的大作中提到】
: make sense,
: 是不是可以纪录一下total_min, total_max, then the next compared value can be
: calculated.

1 (共1页)
进入JobHunting版参与讨论
相关主题
bloomberg刚店面晚。 悔阿海量数据找中数似乎没啥好办法。
其实我很想知道, 多少软工能45分钟内把quicksort写下来LC: 两个排序数组找中数
也问一个median的问题find median for k sorted arrays
Google电面问一道google的题
问一个关于找中数得问题究竟什么定义了DP
找median有O(N)的算法吗?问个白痴问题,DP到底算不算递归?
在 1 billion 的数中找 median问道大数据的题
请问怎样写没有parent pointer的BST iterator?被Facebook的面试的一道题目难倒了
相关话题的讨论汇总
话题: median话题: numbers话题: pivot话题: 103话题: than