请问：bootstrap的应用范围 - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 请问：bootstrap的应用范围

相关主题
● Dr. Efron要来DC讲座了，我该去听吗？	● 对goldmember的数据的三分回归分析结果
● 请教一个bootstrapping的问题。	● bootstrap真的能让让我们逼近“真理”吗
● 请教bootstrap的问题	● 说两句关于bootstrap的看法
● 有个bootstrap的问题想找人讨论下。	● 请教个计算standard error的问题
● 大小样本比较	● How is variance of EM estimates calculated usually?
● [合集] 有个bootstrap的问题想找人讨论下。	● 请教大神们关于bootstrap
● 好奇问陈老师一个问题	● 请教一个bootstrap求variance and mean 的问题
● why bootstrap？	● 统计面试，同胞何苦为难同胞。。。

相关话题的讨论汇总
话题: bootstrap话题: sample话题: 样本话题: variance话题: clt

进入Statistics版参与讨论

(共1页)

l***o
发帖数: 5337

我知道很多情况下，人们用bootstrap来估算估计值的variance，但这显然有局限。比如我从无限整
体中只抽一个样本，来估计均值。这是个无偏估计。然后我就开始狂做bootstrap。可因为只有一个样
本，怎么bootstrap显然都找不到variation。
请诸位给点comments。谢谢！

l*********s
发帖数: 5409

Surely there will be variation, unless the sample to begin with has zero
variance.
Its feasibility does not reply on normality or CLT, so it is more general.

可因为只有一个样本，怎么bootstrap显然都找不到variation。

【在 l***o 的大作中提到】

: 我知道很多情况下，人们用bootstrap来估算估计值的variance，但这显然有局限。比如我从无限整
: 体中只抽一个样本，来估计均值。这是个无偏估计。然后我就开始狂做bootstrap。可因为只有一个样
: 本，怎么bootstrap显然都找不到variation。
: 请诸位给点comments。谢谢！

j*******y
发帖数: 58

FT, apprently you don't know how to do bootstrap.

比如我从无限整
可因为只有一个样

【在 l***o 的大作中提到】

h***i
发帖数: 3844

you need to read efron's book first.

比如我从无限整
可因为只有一个样

【在 l***o 的大作中提到】

P****D
发帖数: 11146

You cannot draw ONE sample and then bootstrap on it!

l***o
发帖数: 5337

我当然知道这不行。我只是举一个极端的例子。就是要问bootstrap可行的条件。谢谢!

【在 P****D 的大作中提到】

: You cannot draw ONE sample and then bootstrap on it!

P****D
发帖数: 11146

What do you mean by "可行的条件"? You want to know the minimum/maximum
sample size that you can apply bootstrapping on? I never thought of that
myself... I simply always use the original sample size.
If I were you, I would use a realistic example to explain what I meant.

谢!

【在 l***o 的大作中提到】

: 我当然知道这不行。我只是举一个极端的例子。就是要问bootstrap可行的条件。谢谢!

l***o
发帖数: 5337

minimum sample size是我想知道的最直接的条件，而且我还想请问有没有其他限制。
一个例子就
是主贴里提到的最generic的情况：抽n个样本估算无限整体的mean，但n比较小，我想
知道估计的
variance。
谢谢。

【在 P****D 的大作中提到】

: What do you mean by "可行的条件"? You want to know the minimum/maximum
: sample size that you can apply bootstrapping on? I never thought of that
: myself... I simply always use the original sample size.
: If I were you, I would use a realistic example to explain what I meant.
:
: 谢!

D*********2
发帖数: 535

let me try to answer in this way,
bootstrap idea can be illustrated as follows,
V_F (T_n) \sim V_{\hat F_n} (T_n) \sim v_{boot}
which can be broken into two parts
1. V_F (T_n) \sim V_{\hat F_n} (T_n)
2. V_{\hat F_n} (T_n) \sim v_{boot}
in order to satisfy 1., \hat F_n approximate F, ur original sample X_1, X_2,
..., X_n should represent the truth. ur bootstrap sample is merely for the
second part, use B bootstrap samples to mimic \hat F_n.

l***o
发帖数: 5337

谢谢！可是我还是不太明白：什么是‘ X_n should represent the truth’。比如说
：（1)如果
我的n足够大（》30），CLT其作用，我能估计ariance。(2)我的n不够大，CLT不能用，
但这时显然
也很难claim这个tiny样本能‘represent the truth’。请问在那种情况下bootstrap
更适用？
是不是在（1）的情况下，bootstrap抛掉CLT能使variance的估计值更小？
非常感谢指教！

X_2,
the

【在 D*********2 的大作中提到】

: let me try to answer in this way,
: bootstrap idea can be illustrated as follows,
: V_F (T_n) \sim V_{\hat F_n} (T_n) \sim v_{boot}
: which can be broken into two parts
: 1. V_F (T_n) \sim V_{\hat F_n} (T_n)
: 2. V_{\hat F_n} (T_n) \sim v_{boot}
: in order to satisfy 1., \hat F_n approximate F, ur original sample X_1, X_2,
: ..., X_n should represent the truth. ur bootstrap sample is merely for the
: second part, use B bootstrap samples to mimic \hat F_n.

相关主题
● [合集] 有个bootstrap的问题想找人讨论下。	● 对goldmember的数据的三分回归分析结果
● 好奇问陈老师一个问题	● bootstrap真的能让让我们逼近“真理”吗
● why bootstrap？	● 说两句关于bootstrap的看法
进入Statistics版参与讨论

h***i
发帖数: 3844

edgeworth expansion，好早以前玩的东西。

。

【在 l***o 的大作中提到】

: minimum sample size是我想知道的最直接的条件，而且我还想请问有没有其他限制。
: 一个例子就
: 是主贴里提到的最generic的情况：抽n个样本估算无限整体的mean，但n比较小，我想
: 知道估计的
: variance。
: 谢谢。

r*****y
发帖数: 199

I took a course of bootstrap this semester, it seems the situation we
discussed a lot is for small sample. For large sample, you can always rely
on the asymptotic theory.

P****D
发帖数: 11146

How small is small?

【在 r*****y 的大作中提到】

: I took a course of bootstrap this semester, it seems the situation we
: discussed a lot is for small sample. For large sample, you can always rely
: on the asymptotic theory.

j*******y
发帖数: 58

no people use bootstrap to estimate mean. you already have a sample mean.
for variance, 30 is enough. 20 may not. precision is lower.

。

【在 l***o 的大作中提到】

r*****y
发帖数: 199

You are right, some of the variance have close form, such as mean, so
bootstrap is not necessary. Some quantity such as CV, you may use bootstrap
method.

【在 j*******y 的大作中提到】

: no people use bootstrap to estimate mean. you already have a sample mean.
: for variance, 30 is enough. 20 may not. precision is lower.
:
: 。

l***o
发帖数: 5337

那位大牛指教一下用CLT和bootstrap估计conficence interval
的区别，
即基于什么样的样本和数据结构，那种方法更适用。多谢！

mean.

【在 j*******y 的大作中提到】

: no people use bootstrap to estimate mean. you already have a sample mean.
: for variance, 30 is enough. 20 may not. precision is lower.
:
: 。

l***o
发帖数: 5337

大，CLT不
能用，
bootstrap

【在 l***o 的大作中提到】

: 谢谢！可是我还是不太明白：什么是‘ X_n should represent the truth’。比如说
: ：（1)如果
: 我的n足够大（》30），CLT其作用，我能估计ariance。(2)我的n不够大，CLT不能用，
: 但这时显然
: 也很难claim这个tiny样本能‘represent the truth’。请问在那种情况下bootstrap
: 更适用？
: 是不是在（1）的情况下，bootstrap抛掉CLT能使variance的估计值更小？
: 非常感谢指教！
:
: X_2,

p********r
发帖数: 1465

For example, we have 10 values of weights. It would not be precise for us to
estimate the population mean weight and its confidence interval based on
normality assumption. For bootstrap, first of all, we draw a sample of size
10 with replacement and obtain the average weight. Then we repeat this
procedure 100 times and we can get a new sample of 100 average weights. Now
we can obtain a more precise estimate and a better distribution of the
average weight.

T*******I
发帖数: 5138

LZ的首发问题里有一个模糊的概念：样本。从他的陈述看，我们不知道究竟他的sample
size是多大。按照当前的概念系统，他的sample size可以是1，也可以是大于1。如果
是等于1的情形，此时做bootstrap将得到一个零variance。这在统计学上将是一个荒谬。
如果是大于1的情形，也不能太小。
在我看来，统计应该不是处理sample size=1的样本的工具。尽管从数学的假设看，这
样的样本理论上存在，但它不应该是统计学需要思考和处理的对象，因为在统计学里讨
论这样的样本毫无疑义。简直是浪费时间。
在我看来，Bootstrap法的哲学基础是值得严肃反思的。

比如我从无限整
可因为只有一个样

【在 l***o 的大作中提到】

l***o
发帖数: 5337

谢谢大师出手！我举的一个样本的例子只是想引出问题，并不是真要考虑这么做。请
问大师，如果你抛
弃bootstrap，我完全没意见，但按照你的理论，怎么估算estimate的variance问题？
谢谢！
p.s.阁下的理论有什么应用吗？还是只是谈着开心用的？

sample
谬。

【在 T*******I 的大作中提到】

: LZ的首发问题里有一个模糊的概念：样本。从他的陈述看，我们不知道究竟他的sample
: size是多大。按照当前的概念系统，他的sample size可以是1，也可以是大于1。如果
: 是等于1的情形，此时做bootstrap将得到一个零variance。这在统计学上将是一个荒谬。
: 如果是大于1的情形，也不能太小。
: 在我看来，统计应该不是处理sample size=1的样本的工具。尽管从数学的假设看，这
: 样的样本理论上存在，但它不应该是统计学需要思考和处理的对象，因为在统计学里讨
: 论这样的样本毫无疑义。简直是浪费时间。
: 在我看来，Bootstrap法的哲学基础是值得严肃反思的。
:
: 比如我从无限整

相关主题
● 请教个计算standard error的问题	● 请教一个bootstrap求variance and mean 的问题
● How is variance of EM estimates calculated usually?	● 统计面试，同胞何苦为难同胞。。。
● 请教大神们关于bootstrap	● 用什么样的统计方法合适！
进入Statistics版参与讨论

r*****y
发帖数: 199

sample
谬。
I totally disagree with you. I think you have no idea about the theory and
application of bootstrap.

【在 T*******I 的大作中提到】

T*******I
发帖数: 5138

我第一次听说Bootstrap法是在2000年夏季来美国参加JSM会议时才听说这个名词的。当
时与我同一Session的一位中国女孩就是讲的这个方法在她的方法论中的应用。说实话
，当时我的英语一塌糊涂，完全没听懂她说的是什么回事。
我是在做分段回归分析的方法论研究时才开始正式触击Bootstrap法的。这是因为在经
历了数十年的探索临界点的可信区间估计法而失败后，随着Bootstrap法的诞生，人
们才开始转向采用这条途径。这是当前体系下唯一可行的办法。但这个办法存在着一个
重大的认识论误区，即移花接木或转换了“认知对象”或“客体”！
不仅如此，在统计学的抽样理论来看，1000个bootstrap样本的加合意味着原始样本中
的每一点被重复抽了1000次，或原始样本中的每一个体都被重复了1000次，或原始样本
的容量因为被复制而扩大了1000倍。这样做有意义吗？这种bootstrap真的能反映总体
的抽样误差吗？我的概念系统帮助我得出了一个否定的观点。
你们在这里可能会提出两个问题：你为什么要把那1000个bootstrap样本加到一起？它
们可加吗？我在我的09年文章里已经回答了这两个问题。这里请大家自己回答它们。
我并非要彻底否定或完全抛弃bootstrap法，而是在那篇文章里提出了一个新的应用方
向：在已知的大样本条件下bootstrap出几个而非成千上万个小样本进行期望和方差估
计，以减小巨量的计算工作量，实现高效的结果输出。这样做既没有改变客体，也体
现了统计学的随机抽样或高效经济的精髓，而且误差可测可控。

estimate

【在 l***o 的大作中提到】

: 谢谢大师出手！我举的一个样本的例子只是想引出问题，并不是真要考虑这么做。请
: 问大师，如果你抛
: 弃bootstrap，我完全没意见，但按照你的理论，怎么估算estimate的variance问题？
: 谢谢！
: p.s.阁下的理论有什么应用吗？还是只是谈着开心用的？
:
: sample
: 谬。

T*******I
发帖数: 5138

我在回答LZ的首发问题时提供了自己的很多观点。你完全不同意我的观点，表明我所说
的每一条观点在你看来都不正确。我看，你真该退出统计学领域了，因为你可能连统计
学最基本的概念都不懂，而你的网名也很可能恰当地描述了你自己。

【在 r*****y 的大作中提到】

:
: sample
: 谬。
: I totally disagree with you. I think you have no idea about the theory and
: application of bootstrap.

F****n
发帖数: 3271

Bootstrapping没有说是一定要加合resample的样本，Bootstrapping的本质是用一个样
本来“模拟”population得到sampling distribution。用来增加sample size只在某些
场合适用。

【在 T*******I 的大作中提到】

: 我第一次听说Bootstrap法是在2000年夏季来美国参加JSM会议时才听说这个名词的。当
: 时与我同一Session的一位中国女孩就是讲的这个方法在她的方法论中的应用。说实话
: ，当时我的英语一塌糊涂，完全没听懂她说的是什么回事。
: 我是在做分段回归分析的方法论研究时才开始正式触击Bootstrap法的。这是因为在经
: 历了数十年的探索临界点的可信区间估计法而失败后，随着Bootstrap法的诞生，人
: 们才开始转向采用这条途径。这是当前体系下唯一可行的办法。但这个办法存在着一个
: 重大的认识论误区，即移花接木或转换了“认知对象”或“客体”！
: 不仅如此，在统计学的抽样理论来看，1000个bootstrap样本的加合意味着原始样本中
: 的每一点被重复抽了1000次，或原始样本中的每一个体都被重复了1000次，或原始样本
: 的容量因为被复制而扩大了1000倍。这样做有意义吗？这种bootstrap真的能反映总体

F****n
发帖数: 3271

理解bootstrapping的逻辑，要从significance test开始。
1. significance test不能决定Truth or Not, 只能表明现有sample能否从统计上证明
假设。
2. 换举话说，fail significance test不代表假设不成立，只是表明现有样本无法证
明，这在样本size小的情况下经常发生。
3. 解决这个问题的统计学方法是获得足够多的样本。
4. Bootstrapping提供了一个先验的思路, 既"虽然我的样本size小, 但代表性和精度
有很高的confidence，所以可以用来来模拟整个population"。
5. 这个先验思路是个很重要的前提，bootstrapping本身从技术上其实就是一个
resampling的trick而已，是否有应用价值取决于这一思路是否成立。作Research的话
得花很大的篇幅来论证。

谢!

【在 l***o 的大作中提到】

: 我当然知道这不行。我只是举一个极端的例子。就是要问bootstrap可行的条件。谢谢!

l***o
发帖数: 5337

这个解释很清楚，谢谢！
我就是很想知道怎么衡量‘代表性和精度’。。。

【在 F****n 的大作中提到】

: 理解bootstrapping的逻辑，要从significance test开始。
: 1. significance test不能决定Truth or Not, 只能表明现有sample能否从统计上证明
: 假设。
: 2. 换举话说，fail significance test不代表假设不成立，只是表明现有样本无法证
: 明，这在样本size小的情况下经常发生。
: 3. 解决这个问题的统计学方法是获得足够多的样本。
: 4. Bootstrapping提供了一个先验的思路, 既"虽然我的样本size小, 但代表性和精度
: 有很高的confidence，所以可以用来来模拟整个population"。
: 5. 这个先验思路是个很重要的前提，bootstrapping本身从技术上其实就是一个
: resampling的trick而已，是否有应用价值取决于这一思路是否成立。作Research的话

T*******I
发帖数: 5138

我的问题是：bootstrapped的样本可不可加？如果不可加，为什么？如果可加，为什么
？后一个问题我已经在我的09年的JSM文章里作出了肯定的回答：它们可加，因为它们
本来就来自同一总体！反之，如果说它们不可加，则整个统计学的基础将发生动摇，而
这种可能性不存在。

【在 F****n 的大作中提到】

: Bootstrapping没有说是一定要加合resample的样本，Bootstrapping的本质是用一个样
: 本来“模拟”population得到sampling distribution。用来增加sample size只在某些
: 场合适用。

(共1页)

进入Statistics版参与讨论

相关主题
● 统计面试，同胞何苦为难同胞。。。	● 大小样本比较
● 用什么样的统计方法合适！	● [合集] 有个bootstrap的问题想找人讨论下。
● 请教， linear regression model问题	● 好奇问陈老师一个问题
● median 能比较么？	● why bootstrap？
● Dr. Efron要来DC讲座了，我该去听吗？	● 对goldmember的数据的三分回归分析结果
● 请教一个bootstrapping的问题。	● bootstrap真的能让让我们逼近“真理”吗
● 请教bootstrap的问题	● 说两句关于bootstrap的看法
● 有个bootstrap的问题想找人讨论下。	● 请教个计算standard error的问题

相关话题的讨论汇总
话题: bootstrap话题: sample话题: 样本话题: variance话题: clt

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天