简单的数据处理问题 - Joke版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Joke版 - 简单的数据处理问题

相关主题
● Re: 中国人和印度人-- costco的免费食物 (转载)	● 求教为啥Amazon不让发这个review. (转载)
● 维吾尔高中毕业生竟有97.87%认为中国不是自己国家 (转载)	● Re: 国人真是一代比一代丑了 (转载)
● 刚才证实了“眼冒金星”这一生理现象	● Re: 去教会混吃混喝和去超市吃free sample哪个更猥琐? (转载)
● Obama leads in OH, no sampling error (转载)	● 民调：一半人认为希拉里被起诉后仍应继续竞选 (转载)
● 几个SAMPLE引发的血案 (转载)	● 学术版：统计到底是不是伪科学
● 我老今天和理科生开了一天会	● 学术: 聪明的人更容易输的游戏？
● 女儿在HMart被烫伤了. (转载)	● 118克拉迄今最大白钻将在香港拍卖 (转载)
● Re: 女生对男友的床上期待是多少？泛指一般的女生	● 请教一个算法问题

相关话题的讨论汇总
话题: population话题: samples话题: 500话题: sample

进入Joke版参与讨论

(共1页)

s*****c
发帖数: 753

求助万能的学术版
从500个样品中确定了mean = m standard deviation=s
问，随机取9个样品取平均，平均值大于m+s的几率是多少。

H******7
发帖数: 34403

0.5

s*****c
发帖数: 753

对呀。可有本书的答案告我说要用students test。然后给了1% 做答案
我凌乱了

【在 H******7 的大作中提到】

: 0.5

M*P
发帖数: 6456

书上写的未必都对吧？
什么书？

【在 s*****c 的大作中提到】

: 对呀。可有本书的答案告我说要用students test。然后给了1% 做答案
: 我凌乱了

l***o
发帖数: 5337

怎么可能

【在 H******7 的大作中提到】

: 0.5

y**c
发帖数: 6307

就是3个sigma之外的几率吧。

【在 s*****c 的大作中提到】

: 求助万能的学术版
: 从500个样品中确定了mean = m standard deviation=s
: 问，随机取9个样品取平均，平均值大于m+s的几率是多少。

G**Y
发帖数: 33224

对个P呀

【在 s*****c 的大作中提到】

: 对呀。可有本书的答案告我说要用students test。然后给了1% 做答案
: 我凌乱了

g**a
发帖数: 2129

你得先说明population是什么distribution。n（0，1）之类的。
[在 springc (阿泉) 的大作中提到：]
：求助万能的学术版
：从500个样品中确定了mean = m standard deviation=s
：...........

s*****c
发帖数: 753

So what's your answer?

【在 G**Y 的大作中提到】

: 对个P呀

s*****c
发帖数: 753

You can assume it is normal distribution.
Actually, based on central limit theorem, 9 samples average will be very
close to normal distribution regardless.

【在 g**a 的大作中提到】

: 你得先说明population是什么distribution。n（0，1）之类的。
: [在 springc (阿泉) 的大作中提到：]
: ：求助万能的学术版
: ：从500个样品中确定了mean = m standard deviation=s
: ：...........

相关主题
● 我老今天和理科生开了一天会	● 求教为啥Amazon不让发这个review. (转载)
● 女儿在HMart被烫伤了. (转载)	● Re: 国人真是一代比一代丑了 (转载)
● Re: 女生对男友的床上期待是多少？泛指一般的女生	● Re: 去教会混吃混喝和去超市吃free sample哪个更猥琐? (转载)
进入Joke版参与讨论

g**a
发帖数: 2129

谁说的？9不够。尤其是你这种要看tail的情况，9太小了。

【在 s*****c 的大作中提到】

: You can assume it is normal distribution.
: Actually, based on central limit theorem, 9 samples average will be very
: close to normal distribution regardless.

s*****c
发帖数: 753

The original question says to assume the data from 500 individual are normal
distribution.

【在 g**a 的大作中提到】

: 谁说的？9不够。尤其是你这种要看tail的情况，9太小了。

h******e
发帖数: 9616

赌场扔骰子连续九次都输的也不是少数。

【在 s*****c 的大作中提到】

: You can assume it is normal distribution.
: Actually, based on central limit theorem, 9 samples average will be very
: close to normal distribution regardless.

s*****c
发帖数: 753

Well, that has nothing to do with what I said.
I am also not sure about your definition of 少数. I can only assert that 连
续九次都输 will occur less frequently than otherwise.

【在 h******e 的大作中提到】

:
: 赌场扔骰子连续九次都输的也不是少数。

h******e
发帖数: 9616

找500个赌徒做个问卷，连续输9+把的目测一半以上。

【在 s*****c 的大作中提到】

: Well, that has nothing to do with what I said.
: I am also not sure about your definition of 少数. I can only assert that 连
: 续九次都输 will occur less frequently than otherwise.

g**a
发帖数: 2129

你这不回到我原来说的了吗？对population distribution需要一个假设。你原题里面
没有写population distribution是什么。如果你限制了population只有500，那完全可
以把mean distribution simulate出来。choose（500，9）也就是5e18个点，均匀取1/
10e10就能看出来了。

normal

【在 s*****c 的大作中提到】

: The original question says to assume the data from 500 individual are normal
: distribution.

s*****c
发帖数: 753

Good idea. If I have 500000 samples, I might apply some funding to get a
good computer to run this simulation.

1/

【在 g**a 的大作中提到】

: 你这不回到我原来说的了吗？对population distribution需要一个假设。你原题里面
: 没有写population distribution是什么。如果你限制了population只有500，那完全可
: 以把mean distribution simulate出来。choose（500，9）也就是5e18个点，均匀取1/
: 10e10就能看出来了。
:
: normal

g**a
发帖数: 2129

哇靠，你搞清楚sample和population的区别没有？前面的问题是population是500，
sample是9，
你现在就变成了sample是500000？那population是多少？你有500000 sample,干嘛还要
做simulation?直接CLT啊。mean=mu，var=delta^2/n。
你不要告诉我，你连population和sample这个基本统计概念都没搞清楚就在这瞎嚷嚷！

【在 s*****c 的大作中提到】

: Good idea. If I have 500000 samples, I might apply some funding to get a
: good computer to run this simulation.
:
: 1/

s*****c
发帖数: 753

No, my population is really the population on earth. The assumption given
by the question is that the quantity I measure (say height) follows normal
distribution. Now I measured 500 individuals (500 samples). From that we
computed the mean (m) and standard deviation (s) from the 500 samples. The
question is the probability of the average of the 9 randomly sampled
individual to be greater than m+s.
The solution you provided is basically monte carlo method. To get accurate
answer, I have to run many iterations to reduce the variance. Actually what
you described will contain bias from the 500 sample itself. What you
really need to do is to get another 500 samples and simulate. I am just
pointing out that it is absurd to resort to this brute-force approach if we
can analytically solve the problem.

【在 g**a 的大作中提到】

: 哇靠，你搞清楚sample和population的区别没有？前面的问题是population是500，
: sample是9，
: 你现在就变成了sample是500000？那population是多少？你有500000 sample,干嘛还要
: 做simulation?直接CLT啊。mean=mu，var=delta^2/n。
: 你不要告诉我，你连population和sample这个基本统计概念都没搞清楚就在这瞎嚷嚷！

g**a
发帖数: 2129

9 samples out of 500 or 9 samples out of population?
If it is the second one, assuming your 500 samples are iid and random, you
should be able to use m and s to estimate sample mean and variance. The dist
for 9 samples mean should also follow normal dist with mean and variance
calculated from estimated population mean and variance.

The
accurate
what

【在 s*****c 的大作中提到】

: No, my population is really the population on earth. The assumption given
: by the question is that the quantity I measure (say height) follows normal
: distribution. Now I measured 500 individuals (500 samples). From that we
: computed the mean (m) and standard deviation (s) from the 500 samples. The
: question is the probability of the average of the 9 randomly sampled
: individual to be greater than m+s.
: The solution you provided is basically monte carlo method. To get accurate
: answer, I have to run many iterations to reduce the variance. Actually what
: you described will contain bias from the 500 sample itself. What you
: really need to do is to get another 500 samples and simulate. I am just

相关主题
● 民调：一半人认为希拉里被起诉后仍应继续竞选 (转载)	● 118克拉迄今最大白钻将在香港拍卖 (转载)
● 学术版：统计到底是不是伪科学	● 请教一个算法问题
● 学术: 聪明的人更容易输的游戏？	● Re: 我准备送一个ipad
进入Joke版参与讨论

n******r
发帖数: 4455

是问sample mean > u + s的概率，不是大于u的，所以显然不是0.5

【在 s*****c 的大作中提到】

: 对呀。可有本书的答案告我说要用students test。然后给了1% 做答案
: 我凌乱了

s*****c
发帖数: 753

0.5%
basically 9 sample mean will have standard deviation of s/3.
So 3 sigma gives you 99%. they only ask you the part that is greater. So 0
.5%.
I don't think we need to use t-distribution because the m and s from 500
samples (of a normal distribution) should be good enough estimation of the u
and sigma of the underlying population. I will need to use t-distribution
if I get m and s from the 9 samples.

【在 n******r 的大作中提到】

: 是问sample mean > u + s的概率，不是大于u的，所以显然不是0.5

s*****c
发帖数: 753

That part I am fuzzing about. I don't have that book with me. What's the
difference? If it is randomly drawn from the 500 samples (I guess we can
assume it is our finite population).

dist

【在 g**a 的大作中提到】

: 9 samples out of 500 or 9 samples out of population?
: If it is the second one, assuming your 500 samples are iid and random, you
: should be able to use m and s to estimate sample mean and variance. The dist
: for 9 samples mean should also follow normal dist with mean and variance
: calculated from estimated population mean and variance.
:
: The
: accurate
: what

n******r
发帖数: 4455

这题用t-test是对的
在实际中假定知道population distribution的情况很少，除了题目明确给定的理想情况
如果population u 和 s 未知，sample size较小（30以下），应该用t-test而不是z-
test
你找本统计推断的书或者google一下t-test和z-test的选择

0
u
distribution

【在 s*****c 的大作中提到】

: 0.5%
: basically 9 sample mean will have standard deviation of s/3.
: So 3 sigma gives you 99%. they only ask you the part that is greater. So 0
: .5%.
: I don't think we need to use t-distribution because the m and s from 500
: samples (of a normal distribution) should be good enough estimation of the u
: and sigma of the underlying population. I will need to use t-distribution
: if I get m and s from the 9 samples.

s*****c
发帖数: 753

I know.
But the m and s are estimated from 500 samples!
My understanding is that if we obtain m and s from <30 samples, then we have
to use t-test.

情况

【在 n******r 的大作中提到】

: 这题用t-test是对的
: 在实际中假定知道population distribution的情况很少，除了题目明确给定的理想情况
: 如果population u 和 s 未知，sample size较小（30以下），应该用t-test而不是z-
: test
: 你找本统计推断的书或者google一下t-test和z-test的选择
:
: 0
: u
: distribution

n******r
发帖数: 4455

你的理解不对
不管你的estimate是从500还是50000个sample里来的，都不能直接当成population
estimate
只有假定population分布已知的情况才能用z-test
你去找本书看看就明白了，好过在这里干争

have

【在 s*****c 的大作中提到】

: I know.
: But the m and s are estimated from 500 samples!
: My understanding is that if we obtain m and s from <30 samples, then we have
: to use t-test.
:
: 情况

s*****c
发帖数: 753

but my sample size is 500.
Or do you think my population is the 500 cases, and my sample size is 9? If
so, my population's u and s is known.

【在 n******r 的大作中提到】

: 你的理解不对
: 不管你的estimate是从500还是50000个sample里来的，都不能直接当成population
: estimate
: 只有假定population分布已知的情况才能用z-test
: 你去找本书看看就明白了，好过在这里干争
:
: have

s*****c
发帖数: 753

I read those books. It just I interpret it differently from you. I really
like to know where my mistakes are.
Seems you believe 9 is the sample size and you have to look up the t-test
table with degree of freedom of 8. And it doesn't matter whether my
estimate is from how many samples. I really can't agree.
If I got my estimate from 5 billion samples, wouldn't the mean and standard
deviation really close to population? And you still tell me I can't use z-
test but to use t-test and will get the same answer?

【在 n******r 的大作中提到】

s*****c
发帖数: 753

so 500 measurements is my population. The randomly selected 9 cases are
from the population.
Given that we know the sigma of the population, we should use z test and the
answer is the 3 sigma.
I made a mistake earlier regarding the probability of 3 sigma. Should be 99
.97%. So the answer should be 0.015%. Is that right?

dist

【在 g**a 的大作中提到】

T****t
发帖数: 11162

还是有高手的。

：就是3个sigma之外的几率吧。
：【在 springc (阿泉) 的大作中提到: 】

相关主题
● 女人要的很简单，男人用心就能做到 (转载)	● 维吾尔高中毕业生竟有97.87%认为中国不是自己国家 (转载)
● 有想在LA找工作，且有较强的数据处理和分析能力的朋友吗？ (转载)	● 刚才证实了“眼冒金星”这一生理现象
● Re: 中国人和印度人-- costco的免费食物 (转载)	● Obama leads in OH, no sampling error (转载)
进入Joke版参与讨论

k**********i
发帖数: 8706

果断蒙特卡洛
来多少概率题都不怕！

(共1页)

进入Joke版参与讨论

相关主题
● 请教一个算法问题	● 几个SAMPLE引发的血案 (转载)
● Re: 我准备送一个ipad	● 我老今天和理科生开了一天会
● 女人要的很简单，男人用心就能做到 (转载)	● 女儿在HMart被烫伤了. (转载)
● 有想在LA找工作，且有较强的数据处理和分析能力的朋友吗？ (转载)	● Re: 女生对男友的床上期待是多少？泛指一般的女生
● Re: 中国人和印度人-- costco的免费食物 (转载)	● 求教为啥Amazon不让发这个review. (转载)
● 维吾尔高中毕业生竟有97.87%认为中国不是自己国家 (转载)	● Re: 国人真是一代比一代丑了 (转载)
● 刚才证实了“眼冒金星”这一生理现象	● Re: 去教会混吃混喝和去超市吃free sample哪个更猥琐? (转载)
● Obama leads in OH, no sampling error (转载)	● 民调：一半人认为希拉里被起诉后仍应继续竞选 (转载)

相关话题的讨论汇总
话题: population话题: samples话题: 500话题: sample

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天