关于median的讨论汇总 - 话题女王

j**l
发帖数: 2911

假设两个有序数组A和B,它们的长度都为n
1. n = 1
median是(A[0] + B[0]) / 2.0
2. n = 2
median是(max(A[0], B[0]) + min(A[1], B[1])) / 2.0
3. n > 2
假如median(A) == median(B), 则median(A+B)就是median(A)或median(B)
假如median(A) != median(B), 不妨设median(A) < median(B)，则
1). ｎ是奇数
假设median(A)和median(B)的单median的序号都是m，则问题化归为求
A[m..n-1]和B[0..m]的median
2). n是偶数
假设median(A)和median(B)的双median的序号都是m1, m2(m2 = m1 + 1), 则问题化归
为求
a. A[m1..n-1]和B[0..m2]的median
注意不可以化归为求
b. A[m2..n-1]和B[0..m1]的median
举例说明
A = {2, 4, 6, 10}
B = {1, 3, 9, 12}
n

s*****n
发帖数: 994

来自主题: JobHunting版 - 请帖个Median of Two Sorted Arrays的好解法？

class Solution {
public:
double median(int A[], int m){
if (m%2) return A[m/2];
else return 0.5*(A[m/2-1]+A[m/2]);
}
double median(int A[], int m, int val){//m>=2
if (m%2){
if (val else if (val==A[m/2]) return val;
else return 0.5*(min(A[m/2+1],val)+ A[m/2]);
}else{
if (val < A[m/2-1]) return A[m/2-1];
else if (val > A[m/2]) r... 阅读全帖

h**********l
发帖数: 6342

来自主题: JobHunting版 - An interview question of finding the median in a moving window.

除了第一次，以后应该都是 O（1）
第一次sort一下window里面n个数记下median
然后删一个数，跟median比较，你就知道左边还是右边，或者median被删，也是一样的
新加一个数，也知道median左边右边
如果在median两边， median不变，
不然就是median旁边一个是新的median，左边右边判断一下
window是偶数情况应该类似的

window
heap

d**********x
发帖数: 4083

来自主题: JobHunting版 - Haker Rank Median...

那题目貌似就没卡复杂度。。。枉我昨天下班后还捏着鼻子写了个treap。。。结果证
明是输出格式错了，只过了两三个case的可以考虑检查一下输出格式
思路大致就是用order statistics tree。每次查询都是O(logn)。手写最大问题在于没
法定制容器的行为。。。
好吧，然后想了一下，如果用两个heap加一个map的话也是可以的，因为只是查询
median，没有要求随机查询kth元素，所以还是写复杂了
附上代码，treap是个随机平衡树，可以拿去用。
#include
#include
#include
#include
using namespace std;
template
class Treap {
public:
class Node {
public:
Node(const T& value)
: value_(value),
left_(NULL),
right_(NULL),... 阅读全帖

x*****0
发帖数: 452

来自主题: JobHunting版 - median of two sorted arrays的时间复杂度(附上了过了oj的代码）

关于这道题的时间复杂度，有个疑问。
基本算法是参考
ref1:
http://www2.myoops.org/course_material/mit/NR/rdonlyres/Electri
和
ref2:
http://leetcode.com/2011/03/median-of-two-sorted-arrays.html#co
来做的。
大致介绍一下ref1中所描述的算法，基本思路是binary search.
假设数组A和B，长度分别为nA和nB。nA+nB=n。
（1）median 不是在A中就是在B中。（基本是一句废话，:-))
（2）选择数组A的median,假设其index为i=（l+r)/2（初始时l=0，r=nA-1）。比较A[i
]和B[j]，B[j+1]，j=n/2 - i - 1。j做如此选择，是因为如果A[i]是meidian of two
sorted arrays的话，A[i]必须大于B中的j个元素。
（3）如果B[j]<=A[i]<=B[j+1],那么A[i]必定是我们要找的结果。
（4）如果A[i]阅读全帖

v***o
发帖数: 287

来自主题: JobHunting版 - 问一道面试题，关于大数据如何高效找出median数

median of medians.每个机器 sort, 找到median, 然后继续从n的median 数组中招
median, recursive.但结果是proximity median.

m****r
发帖数: 141

来自主题: Quant版 - An interview question of finding the median in a moving win (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: mitcar (mitcar), 信区: JobHunting
标题: An interview question of finding the median in a moving window.
发信站: BBS 未名空间站 (Thu Mar 22 09:53:07 2012, 美东)
This is an interview question. The interview has been done.
Given a sequence of data (it may have duplicates), a fixed-sized moving
window, move the window at each iteration from the start of the data
sequence, such that
(1) the oldest data element is removed from the window and a new data
element is pushed into the wi... 阅读全帖

x***y
发帖数: 633

来自主题: JobHunting版 - 一个算法题：Selecting median of three sorted arrays

we can reduce it to 2 arrays problem...First compare the median of 3 arrays, if A[n/2] C[0:n/2-1] together, we can find the median of these two in logn, compare
it witht the median of B, eliminate corresponding part of B and A, C
combination. To do it in A, C combination, we have to find the position of
median in both A and C, O(logn) time. This B and A,C combination is just a 2
array problem except to find median and do e

m****r
发帖数: 141

来自主题: JobHunting版 - An interview question of finding the median in a moving window.

This is an interview question. The interview has been done.
Given a sequence of data (it may have duplicates), a fixed-sized moving
window, move the window at each iteration from the start of the data
sequence, such that
(1) the oldest data element is removed from the window and a new data
element is pushed into the window
(2) find the median of the data inside the window at each moving.
My idea:
Use 2 heaps to hold median. In side the window, sort the data in the window
in the first iteration, ... 阅读全帖

b*****o
发帖数: 715

来自主题: JobHunting版 - find the median of an infinite data stream of integers

对于one pass算法,
（1）要求exact median, 可以证明内存需要至少是N/2。
（2）如果允许用随机算法，对于内存量是s，可以先用reservoir sampling采样出s个
sample,然后在对这s个数求median。你可以用概率公式算一下误差。
（3）如果允许approximate median,可以把内存量减少到(log(N))^2。大致想法是把
已经扫描过的数作类似quantization的压缩:对于长度为s的buffer，每个元素不但存一
个value(i),而且还存一个weight(i)。所有的weight加起来就是目前扫描过的数的总数
。算法的关键是要保证：在任何时刻，如果把value(i)重复weight(i)次,然后把所有这
样重复出来的数组成一个stream，用它来计算median，和用原数组计算median，能大致
相当
。细节可参看以下paper:
http://infolab.stanford.edu/~manku/papers/98sigmod-quantiles.pd

w**********8
发帖数: 121

来自主题: JobHunting版 - 一个算法题：Selecting median of three sorted arrays

I was trying to reduce the problem to 2 arrays but I did not succeed. In
your solution, even if you find median of A and C. How do you compare it
with B to find the median of A, B,C.
A:1 4 6 10 18
B:22 28 33 45 58
C:7 9 100 180 200
you find the median of A and C is 9 or 10. Tell me how do you use them to
find the median of 3 arrays.

arrays, if A[n/2] A[n/2:n-1] and
2

b*****e
发帖数: 474

来自主题: JobHunting版 - median of an array of ints, 请问这题的经典回答是什么？谢谢

there is a textbook classic of "median of medians of 5", which finds the
median in O(n) worst case time but have no practical use;
practical solution is to use quick-selection, based on the quicksort
partition idea and is average O(n), as mentioned upstairs. Worse case can
typically be avoided by some techniques of choosing the partition value,
such as the "median of three"
search trees, sorting, heaps - these are mostly O(n lg n) solutions and very
practical ones too.
For integers, don't forget... 阅读全帖

j********e
发帖数: 1192

来自主题: JobHunting版 - find the median of an infinite data stream of integers

有proximation算法，我大概记得的一个算法是：
每来一个数据D，跟当前的median比较，如果D大，那么就增加median，
否则就减少median。那这个增加和减少的step有多种选法，例如
a const small number，或者根据数据的variance，或者density estimation
等。我记得能保证这样的得到的数的expectation是median，而且可以
用于任何quantile （例如75-percentile).

d****n
发帖数: 397

来自主题: JobHunting版 - 请教一道题 median ii

用两个heap.一个max heap，一个min heap， maxheap size ＝ min heap size 或者
maxheap size ＝ minheapsize ＋ 1.
java implementation
public class Solution {
/**
* @param nums: A list of integers.
* @return: the median of numbers
*/
public int[] medianII(int[] nums) {
// write your code here
PriorityQueue pmin = new PriorityQueue ();
PriorityQueue pmax = new PriorityQueue (10, new
MaxComparator());
int i, l = nums.l... 阅读全帖

O******n
发帖数: 1505

来自主题: JobHunting版 - 求 1st quantile，已知一个可以返回median的方法

get median of array
scan all entries below median to form a new array
get median of new array
right?

P**********c
发帖数: 3417

来自主题: JobHunting版 - 用什么数据结构快速insert, get median

insert过程中，保证两个heap一样大，或者max heap多一。max heap里的所有值小于min heap里的所有值。每次insert只需要比较要insert的数跟两个heap的root决定进哪个heap, 如果insert后两个heap大小不满足要求则把其中一个的root插到另一个。 insert复杂度O(lgn).
找median时，如果两个heap一样大，则median为两个root的平均。
如果max heap多一，则为max heap的root
找median操作为O(1).

w*******s
发帖数: 96

来自主题: JobHunting版 - median in median selection algorithm?

一直不清楚median in median这个selection algorithm应该用在哪个场合。看到ALSR
书 version 3: Page 220有讲，但还是不明白。大牛出来指点一下。。

b*********3
发帖数: 748

来自主题: JobHunting版 - N台server，求median

今天碰到一道题，觉得对方提示有点不可思议。
给定N台server，每台记录了X个response time。每台的X不一定一样。然后给一台
central server，求出所有response time的median。
先想了每台回复median，但这样俨然不对。然后就想都传给central server，但是这样
太浪费流量。对方提示每台都只给central发一次数据就求出来。还说想想average怎么
求的。但是average和median完全两码事，没任何联系啊。
谢谢！

A*H
发帖数: 127

来自主题: JobHunting版 - 大文件找median

比如几百G的文件，找median有什么好办法么?
能想到的就是apply median of median linear selection and use external storage
for partition..

g****y
发帖数: 240

来自主题: JobHunting版 - median 到底是啥意思??

median 是2和3，一个叫lower median,一个是upper median.

m*******i
发帖数: 34

来自主题: JobHunting版 - 请问一个关于array median 的问题

求一个array的median的值，但是这个array有duplicate的元素， median该怎么算，
比如， array＝［3， 3， 5］或［3， 3， 3， 5］，那median是什么？
谢谢。

g****v
发帖数: 971

来自主题: JobHunting版 - 简单map reduce mean median，傻逼回答

那2个bucket已经有需要第几个是median的信息了，然后找到就行了。
比如第一轮mapreduce告诉你说第2000个element in [bucket1,bucket2]是整个的
median，你就把[bucket1,bucket2]在第二遍mapreduce中排序，找到第2000个element
which is the final median for the whole number array.

p*******g
发帖数: 809

来自主题: Living版 - 你们的median room有门吗？

前贴看有人说median room有门。但是为什么我们的没有门呢？和这个视频里的median
room几乎一摸一样（从7：36分看）：https://www.youtube.com/watch?v=WDWKoh53h1E
我们的房子是新房。这种median room是最近的潮流吗？还是蹩脚的设计？

p*******g
发帖数: 809

来自主题: Living版 - 你们的median room有门吗？

请大家看那个视频，从7:35开始，桌球右边的那个就是median room，视频里也没有显
示出门啊？而且我去model home里看的时候，也没有看到门。在zilow上看我们这富人
区卖到几十million 到几million的房子，median room好像也是这样的。所以我很奇怪
啊。如果现在的潮流是不装门的话，那我也就不管了。如果传统上median room都要装
门，那我趁现在赶紧要求builder造门好了。

n******e
发帖数: 1248

来自主题: Immigration版 - 请问自己的Median Impact Factor 怎么算？

看到一个解释： The median Impact Factor is the median value of all journal
Impact Factors in the subject category.
那估计自己的median IF，就是发过杂志种类的average IF

k*****a
发帖数: 1518

来自主题: Canada版 - 2011年香港的工资收入情况，中值(median)月收入为12,000港元 (转载)

【以下文字转载自 HongKong 讨论区】
发信人: kangjia (康佳--红太阳), 信区: HongKong
标题: 2011年香港的工资收入情况，中值(median)月收入为12,000港元
发信站: BBS 未名空间站 (Fri Feb 1 22:12:25 2013, 美东)
香港的工资还真没有想象中的高。2011年香港的中值(median)月收入为12,000港元。注
意这个中值是针对所有工作的人，有50%的工作人月收入低于12,000港元。
前面有人问月薪50,000港元是什么工资水平。从下面的统计可以看出，月薪5万港元在
香港是很高的收入，是最高的7%左右，93%的有工作的人的收入低于5万港元一个月。
HK's median monthly income is HK$12,000
9.8% earned more than HK$40,000/month
4.5% earned more than HK$60,000/month
1st column is monthly income range, in HK dollars
2nd column is... 阅读全帖

k*****a
发帖数: 1518

来自主题: Toronto版 - 2007年Ontario工程师中值(median)收入为$88,000 (转载)

【以下文字转载自 Canada 讨论区】
发信人: kangjia (康佳--红太阳), 信区: Canada
标题: 2007年Ontario工程师中值(median)收入为$88,000
发信站: BBS 未名空间站 (Mon Nov 19 16:46:26 2007)
如果你是engineer,居住在ontario,你2007年年收入为$88,000，你就处于中值状态，即
50% 的工程师收入高于你，亦即50%的工程师收入低于你。这个收入为雇佣收入，是指
你上班所得的工资收入，不包括下班打工，开公司或炒股投资等收入。。
As of June 1, 2007, the median base salary for an Ontario engineer is $84,
600, and median income is $88,000.
Source: 2007 Ontario Society of Professional Engineers (OSPE) Employer
Compensation Survey.

r****o
发帖数: 1950

来自主题: Mathematics版 - 请问K-Median Problem和P-Median Problem是一回事吗？

经常看到K-Median 问题的paper,也看到过一些P-Median 问题的paper，
这两个是一回事吗？我感觉就是一个问题，只是名字不一样，呵呵。
还有K-Mean和P-Mean是不是也是一回事?

a****y
发帖数: 99

来自主题: Quant版 - An interview question of finding the median in a moving win (转载)

my 2cents,use AVL tree, plus each node store the number of node of the sub-
tree.so that median finding costs log(m). insert, delete also costs log(m).
O(nlog(m) )
I google it and find some reference :
Comparison of algorithms for standard median filtering
Juhola, M. Katajainen, J. Raita, T.
Signal Processing, IEEE Transactions on
Jan. 1991, Volume: 39 , Issue: 1, page(s): 204 - 208
Abstract
On computation of the running median
Astola, J.T. Campbell, T.G.
Acoustics, Speech and Signal Proc... 阅读全帖

c**********e
发帖数: 2007

来自主题: Statistics版 - Why the output data set does not give median?

proc means data=temp mean std median max min;
var age;
class c1;
output out=summ;
run;
The above procedure gives median in list output, but not in
the output dataset summ. Could anybody tell why? And how to
put median to the summ dataset.
Thanks a lot!

s******n
发帖数: 6806

来自主题: Military版 - 中国的统计方法有没有用过median

median比较resistant,对outliers不敏感。
感觉还是median好。
用mean的话，直接就被少数富人的钱拉了上去。

w*********r
发帖数: 42116

来自主题: Military版 - 以后要看 median, 看人均有个鸟用啊

还是用MEAN好，任何一个老百姓都可以说自己不到平均数，用了MEDIAN，一半的老百姓
就没法抱怨了。当然，也可以所有老百姓都说不到MEDIAN来证明统计错误。

发帖数: 1

来自主题: Military版 - 还是德州好，这家公司median salary50万一年 (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: lovesunny123 (lovesunny123), 信区: JobHunting
标题: 还是德州好，这家公司median salary50万一年
发信站: BBS 未名空间站 (Wed Aug 29 17:48:46 2018, 美东)
注意不是average哦
Lilis Energy Inc. (Nasdaq: LLEX) of San Antonio outranked all companies with
a median employee wage of $500,000. The oil and gas exploration company had
27 workers at the close of its most-recent fiscal year.
https://www.bizjournals.com/boston/news/2018/08/06/why-boston-has-so-many-of
-the-best-paying.html

q******s
发帖数: 7469

来自主题: USANews版 - 比较一下海龟和海不龟的中位数Median年收入 (转载)

【以下文字转载自 Returnee 讨论区】
发信人: hedge2007 (hedge2008), 信区: Returnee
标题: 比较一下海龟和海不龟的中位数Median年收入
发信站: BBS 未名空间站 (Thu Sep 17 21:31:30 2009, 美东)
我估计，海龟20万人民币/年，海不龟8万美元/年
说明：
0。中位数Median的定义：50%的人在此之上，50%的人在此之下。
1。指可持续收入，不包括一次性收入，比如海龟的安家费，启动资金，海不龟的Sign-
On Bonus，学费报销等。
2。为了使问题简单一些，只包括在美国的留学生，即以F1身份出来的人。
3。偷渡的，被搬运的，陪读的，家属移民过来的，外嫁/外娶的，不在讨论的样本之列。
4。海不龟包括全美国所有州，海龟包括中国大陆所有省份，城市。
我的估计的原因：
1。在美国的海不龟，按人数来说，Computer Science， EE， Chemistry /Biology，
Accounting应该是前四大类。Computer Science， EE， Chemistry /Biology的中

D*******o
发帖数: 3229

来自主题: USANews版 - 民主体现了AVERAGE或者MEDIAN的民众的意愿

关于群众智慧，有这么一个故事：
Galton was a keen observer. In 1906, visiting a livestock fair, he stumbled
upon an intriguing contest. An ox was on display, and the villagers were
invited to guess the animal's weight after it was slaughtered and dressed.
Nearly 800 participated, and Galton was able to study their individual
entries after the event. Galton stated that "the middlemost estimate
expresses the vox populi, every other estimate being condemned as too low or
too high by a majority of the voters",[36] ... 阅读全帖

x***y
发帖数: 633

来自主题: JobHunting版 - 一个算法题：Selecting median of three sorted arrays

A[n/2]=6, B[n/2]=33, C[n/2]=100, So A':6 10 18 C': 7 9
To find the median M' of A' C', O(log(max(|A'|, |C'|)))=O(logn).
if M' < B[n/2], B is left with B[22 28 33], and find data in A'less than M'
, find data in C' less than M', O(logn), and eleminate them.....
All problems of this kind is essentially 2 array problem, and the time
efficiency is O(logn) * time for finding the median and delete operation in each array. For sorted array, 2nd term is O(1), but for an partially sorted array A'+C',th

L****t
发帖数: 924

来自主题: JobHunting版 - 请教一个BST找Median的题目

应该可以把，如果用两个指针，一个从正常的preorder扫过去，
一个是从右子数开始的Preorder扫过去，两个相同的时候（假设奇数）就是Median，
不过递归中怎么控制这两个指针都同步扫呢。。
或者是先Travel一次算出一共多少个Node，然后再遍历到 n/2 就是Median，
不过算出多少Node的话也算用到static 或者global variable了吧。。

y**i
发帖数: 1112

来自主题: JobHunting版 - 求两个等长有序数组的median的细节

问一下，不明白，求median为什么变成了求mean？median不是中位数的意思么？

j**l
发帖数: 2911

来自主题: JobHunting版 - 求两个等长有序数组的median的细节

如果有序序列的长度为奇数，则median就是正中间的那个数
如果有序序列的长度为偶数，则median是中间的那两个数，有三种处理：
1. 返回median1和median2
2. 返回median1
3. 返回(median1 + median2) / 2.0

j**l
发帖数: 2911

来自主题: JobHunting版 - 请教一题，求两个不等长的有序数组的median

你这样的padding对么？
比如
A: 1
B: 3, 3, 3
直接求median是3
对A进行padding得到
A: 1, 1, 1
B: 3, 3, 3
求得median是2

r****o
发帖数: 1950

来自主题: JobHunting版 - 也问一个median的问题

大家看看我的想法对不?
可以模仿merge sort,先考虑N为偶数的情况
设这N列为c1,c2,...,cN
将c1和c2, c3和c4, ..., c_{N-1}和cN 两两 merge成一个长2N的列，设为d1,d2,...d_{N/2}
然后再将d1和d2,d3和d4,...,d{N/2-1}和d{N/2}两两merge成一个长4N的列，
如此反复merge,直到最后剩下两个长N*N/2的列，用binary search可找到median。
时间复杂度，O(N)+O(2N)+...+O(N*N/2)+O(2lgN)=O(N^2 lgN).
空间复杂度O(N^2).
当N为奇数时，可将最中间那列先空着，当左右两边都merge成了长(N-1)N/2的列后，再merge成一个长(N-1)N的大列，然后问题可归结为一个长(N-1)N的列和一个长N的列，都排好序，找median的问题。
时间复杂度和空间复杂度应该和偶数时一样。

d****e
发帖数: 251

来自主题: JobHunting版 - 也问一个median的问题

没必要merge阿，每列都是增序了。
这里的好处是NxN，直接比较N个median, 可能可以转换为比较两列的问题。
可能比facebook那道题还简单。

d_{N/2}
再merge成一个长(N-1)N的大列，然后问题可归结为一个长(N-1)N的列和一个长N的列，
都排好序，找median的问题。

b********h
发帖数: 119

来自主题: JobHunting版 - median of N^2 numbers across N machines

If you can use MapReduce, you could partition the numbers into several
buckets (Mapper) and count how many numbers are in each bucket. Then, you
know which bucket to look for the median. If the bucket is small enough, we
could put it into memory and find the median. Otherwise, do another
mapreduce job.

m********l
发帖数: 4394

来自主题: JobHunting版 - 求 1st quantile，已知一个可以返回median的方法

啊?
不就是median of median?

s******d
发帖数: 61

来自主题: JobHunting版 - 用什么数据结构快速insert, get median

我想着用balanced binary search tree: insert:O(logn), get median:O(logn)
或者linked list: 指针指中间, insert: O(1), get median: O(1)
不过感觉好像有问题

K*****k
发帖数: 430

来自主题: JobHunting版 - 求两个有序数组的median的平凡解法?

没看明白。
而且当总共的元素个数为偶数的时候，median的定义是中间两个数的平均数。
比如
int A[] = {1};
int B[] = {5, 6, 7, 8, 9};
合并后是{1, 5, 6, 7, 8, 9}, median = (6 + 7) / 2 = 6.5

K*****k
发帖数: 430

来自主题: JobHunting版 - 求两个有序数组的median的平凡解法?

那个log(m + n)的算法现场想到还是很难的，好像MIT某个课程的handout中有解答，但
对偶数个元素，也是求出第一个median, 而不是两个median的平均值。
ihasleetcode的网站有完整巧妙的几种解答方法，可惜我觉得对于面试的白板来说，可
用空间太小了，写不下。

t****y
发帖数: 27

来自主题: JobHunting版 - 一道求median的题

If the stream is infinite, there is no way to get the median with a 100%
accuracy.
I remember there is a paper discussing this problem, and it presents an
algorithm that using a certain amount of memory you can find the median
with a low bound probability. Sorry forgot the paper name though.

returns

w***y
发帖数: 6251

来自主题: JobHunting版 - 大文件找median

我遇到过一个题目，是给了条件，找int的median, 相当于知道了range. 设计一个
distributed的算法， master machine给一个x, 让其他的machine算比x大、比x小的数
各有多少－－－median就是两边个数一样；否则就不停缩小x的可搜索范围
如果不知道range，我不懂怎么算

j********e
发帖数: 1192

来自主题: JobHunting版 - 在 1 billion 的数中找 median

可以把quickselect和median of median搞成parallel的吧

best

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天