急求各位大牛关于一个project的clue（8.24.2013） - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 急求各位大牛关于一个project的clue（8.24.2013）

相关主题
● sign vs signed rank	● bagging 用于logistic regression because of unbalance data
● 请教： Var(Xi-X-bar) 等于Var(Xi) 吗？	● A question about basic statistics, thanks.
● 问个binomial variance的公式	● 请教：为啥ANOVA需要cosntant variance？
● 学术一个：不完全数据的统计怎么做？	● 一个棘手的问题请教大伙
● 请教F test和t test区别？	● 发包子问一个sample size 的问题!!!
● Qing问个问题	● [合集] 一道点估计的题目请教! 急啊
● microarray 统计问题请教！！	● exponential weighted moving average
● 请教一个简单问题	● 请教一道统计题，万分感谢！！！

相关话题的讨论汇总
话题: variance话题: 试件话题: 500话题: test话题: specimen

进入Statistics版参与讨论

(共1页)

b******3
发帖数: 4385

背景：
用两种类型的方法对某个量进行测量，一种是使用人visual 观察，一种是使用计算机
自动读取，项目的目的是通过统计手段说明计算机自动读取是一种unbiased 并且相对
人观察更efficient的方法。
已有数据：
（ 1）500个不同人对500个不同试件观察的结果
（2）计算机对（1）中的500个不同试件的读取结果
（3）500个人对同一试件读取数据的标准偏差
初步思路：
想通过数据（3）将数据（1）种的由于人为引起的variance 消除，这样可以将处理后
的数据（1）当成每个试件的真实值，一旦每个试件的真实值存在了，就可以和数据（3
）进行某种correlation统计。
存在问题：
每一个试件只有一个样本，很难通过这一个值对真实的值进行estimate. （3）中的数
据不知道如何有效利用起来。
望大家不吝赐教，任何相关的思路，technique ,reference 都不甚感激。

g******2
发帖数: 234

Not big cow, just my 2 cents:
1. is it quantitative or categorical? if categorical, then majority vote may
be a better way to obtain the truth. otherwise, just use mean of the 500
raters' evaluation and treat that as the truth
2. bias. are those 500 samples random selected from some population? if so,
check the difference of mean machine reads and the truth for each sample and
see whether they look like normal with mean close to 0 or run an equivalent
test.
3. efficiency. usually statistical efficiency could be defined by ratio of
variance, so that (3) is useful. also, you could also show the time
efficiency of machine.

b******3
发帖数: 4385

Hello getdown2 ,
Appreciate your suggestions. Below is my elaboration of my questions.
All the data is quantitative. Each of the 500 readings is on a different
specimen( so here technically one specimen(not one sample, is one
population, since people can run cycles of test on this specific specimen).
That is the trick part of the story .
Considering this situations, to get the true value for each specimen is
probably impossible, but we have 500 specimens and also our final product is
the how good is the computer estimate (unbiased, vs biased) , there should
be some statistical method to avoid address the problem individually , but
in a more holistic manner.
Thanks again

may
,
and
equivalent

【在 g******2 的大作中提到】

: Not big cow, just my 2 cents:
: 1. is it quantitative or categorical? if categorical, then majority vote may
: be a better way to obtain the truth. otherwise, just use mean of the 500
: raters' evaluation and treat that as the truth
: 2. bias. are those 500 samples random selected from some population? if so,
: check the difference of mean machine reads and the truth for each sample and
: see whether they look like normal with mean close to 0 or run an equivalent
: test.
: 3. efficiency. usually statistical efficiency could be defined by ratio of
: variance, so that (3) is useful. also, you could also show the time

k*******a
发帖数: 772

可以看作是 paired t-test，也就是把人和计算的每个物件的差当作y来做inference
看是不是0，这个可以看是不是biased
(3)可以用来估算计算机的variance，如果variance比人小说明更effective

b******3
发帖数: 4385

谢谢， that sounds the way out, 但是对计算机读取的variance 依靠数据（2）不
知道怎么才可以得到

inference

【在 k*******a 的大作中提到】

: 可以看作是 paired t-test，也就是把人和计算的每个物件的差当作y来做inference
: 看是不是0，这个可以看是不是biased
: (3)可以用来估算计算机的variance，如果variance比人小说明更effective

k*******a
发帖数: 772

var(computer - human) = var(human) + var(computer)
可以得到computer的variance

【在 b******3 的大作中提到】

: 谢谢， that sounds the way out, 但是对计算机读取的variance 依靠数据（2）不
: 知道怎么才可以得到
:
: inference

b******3
发帖数: 4385

我一重新看一下，你刚才说的paired t test , 但是发现一个小问题，这个方法应该是
对一个sample 的随机读数有作用的吧，我这个问题是每个试件其实不是一个单纯的
random sample的问题，他们每个是不同的试件。所以你提到的计算variance 的方法或
许也要改进的，因为这500个数据中，不论人读的，还是计算机读的，其实都不是单纯
的随即样本，而是500个不同的poplulaiton 中的一个样本。

【在 k*******a 的大作中提到】

: var(computer - human) = var(human) + var(computer)
: 可以得到computer的variance

t*****w
发帖数: 254

I still believe that it is valid var(computer - human) = var(human) + var(
computer).
even if you have 500 samples from different populations, you can still
assume there is homogeneity of variance for all populations. Otherwise it is
meaningless to apply the variance from data (3) to data (1).
Do Pearson or Spearman correlation between data (1) and (2).
Do non-parametirc Wilcoxon comparison between data (1) and (2).
我一重新看一下，你刚才说的paired t test , 但是发现一个小问题，这个方法应该是
对一个sample 的随机读数有作用的吧，我这个问题是每个试件其实不是一个单纯的
random sample的问题，他们每个是不同的试件。所以你提到的计算variance 的方法或
许也要改进的，因为这500个数据中，不论人读的，还是计算机读的，其实都不是单纯
的随即样本，而是500个不同的poplulaiton 中的一个样本。

【在 k*******a 的大作中提到】

: var(computer - human) = var(human) + var(computer)
: 可以得到computer的variance

b******3
发帖数: 4385

Thanks for your suggestions.

is

【在 t*****w 的大作中提到】

: I still believe that it is valid var(computer - human) = var(human) + var(
: computer).
: even if you have 500 samples from different populations, you can still
: assume there is homogeneity of variance for all populations. Otherwise it is
: meaningless to apply the variance from data (3) to data (1).
: Do Pearson or Spearman correlation between data (1) and (2).
: Do non-parametirc Wilcoxon comparison between data (1) and (2).
: 我一重新看一下，你刚才说的paired t test , 但是发现一个小问题，这个方法应该是
: 对一个sample 的随机读数有作用的吧，我这个问题是每个试件其实不是一个单纯的
: random sample的问题，他们每个是不同的试件。所以你提到的计算variance 的方法或

b******3
发帖数: 4385

i am just wondering except these classical “test” that i can run
conveniently . is there any other more intuitive methods that based on
simulation using computational statistics. thanks

is

【在 t*****w 的大作中提到】

相关主题
● Qing问个问题	● bagging 用于logistic regression because of unbalance data
● microarray 统计问题请教！！	● A question about basic statistics, thanks.
● 请教一个简单问题	● 请教：为啥ANOVA需要cosntant variance？
进入Statistics版参与讨论

t*****w
发帖数: 254

It is clear to me that you want to get the smaller variance from computer
read-in data than that from human observed data.
I am wondering whether current design is the best way to achieve the above-
mentioned objective.

【在 b******3 的大作中提到】

: i am just wondering except these classical “test” that i can run
: conveniently . is there any other more intuitive methods that based on
: simulation using computational statistics. thanks
:
: is

b******3
发帖数: 4385

I know it it is not the best way to design the test. but here is as much
data as we could have . i can only play with these data , i am not sure
there is a sophisticated way to serve my planned objective. Maybe it is
impossible

【在 t*****w 的大作中提到】

: It is clear to me that you want to get the smaller variance from computer
: read-in data than that from human observed data.
: I am wondering whether current design is the best way to achieve the above-
: mentioned objective.

w*******9
发帖数: 1433

Once you define "unbiasedness" and "variance", you will find it's clear
whether you can answer the question or not. For example, let pi be the truth
; hi the observed value from the human and mi the observed value from the
machine. Show us how would you define "unbiasedness" and "variance". Math is
best way to make things clear.

b******3
发帖数: 4385

here there is an assumptions involved that the measure from human is
unbiased , since this is the traditional method of measuring this value for
decades. To put it in a more clear way is if you run this test on one
specimen repetitively , you will have an average converge to a value , this
value is taken as a true value.
In terms of variance , it is just a statistics to gauge the concentration of
your random readings.

truth
is

【在 w*******9 的大作中提到】

: Once you define "unbiasedness" and "variance", you will find it's clear
: whether you can answer the question or not. For example, let pi be the truth
: ; hi the observed value from the human and mi the observed value from the
: machine. Show us how would you define "unbiasedness" and "variance". Math is
: best way to make things clear.

w*******9
发帖数: 1433

不好意思你误会了，我绝对不会认为你不知道无偏和方差的意思。想想一下有些式样很
难读，所以individual level variance -- vi---会大些，这时候你可以允许hi-mi比
较大；有些式样比较容易读，所以individual level variance会比较小，就不太能允
许hi-mi比较大。所以就应该对（hi-mi)/sqrt(vi) 用t-test。不同的vi序列会导致不
同的结论，而你只知道500个vi中的一个，所以你要假设这500个vi之间有某种联系（比
如相等) 也就是TankerW说的。
BTW，即使vi不全相等，你不除sqrt(vi)的话，你得到的t-test也是个valid level
alpha test (basically because you can treat these obs as iid in a larger
probability space), but the power is inferior to the test that adjusts the
obs by sqrt(vi)。

for
this
of

【在 b******3 的大作中提到】

: here there is an assumptions involved that the measure from human is
: unbiased , since this is the traditional method of measuring this value for
: decades. To put it in a more clear way is if you run this test on one
: specimen repetitively , you will have an average converge to a value , this
: value is taken as a true value.
: In terms of variance , it is just a statistics to gauge the concentration of
: your random readings.
:
: truth
: is

b******3
发帖数: 4385

谢谢你的建议，it is very helpful，顺便问一下，你知道有什么样的偏
computational 的方法，不依赖于传统的一些方法达到同样的目标吗。

【在 w*******9 的大作中提到】

: 不好意思你误会了，我绝对不会认为你不知道无偏和方差的意思。想想一下有些式样很
: 难读，所以individual level variance -- vi---会大些，这时候你可以允许hi-mi比
: 较大；有些式样比较容易读，所以individual level variance会比较小，就不太能允
: 许hi-mi比较大。所以就应该对（hi-mi)/sqrt(vi) 用t-test。不同的vi序列会导致不
: 同的结论，而你只知道500个vi中的一个，所以你要假设这500个vi之间有某种联系（比
: 如相等) 也就是TankerW说的。
: BTW，即使vi不全相等，你不除sqrt(vi)的话，你得到的t-test也是个valid level
: alpha test (basically because you can treat these obs as iid in a larger
: probability space), but the power is inferior to the test that adjusts the
: obs by sqrt(vi)。

b******3
发帖数: 4385

顶一次：）

b******3
发帖数: 4385

(共1页)

进入Statistics版参与讨论

相关主题
● 请教一道统计题，万分感谢！！！	● 请教F test和t test区别？
● 问个sufficient 和 completeness 的问题	● Qing问个问题
● Degree of Freedom	● microarray 统计问题请教！！
● 请教bootstrap的问题	● 请教一个简单问题
● sign vs signed rank	● bagging 用于logistic regression because of unbalance data
● 请教： Var(Xi-X-bar) 等于Var(Xi) 吗？	● A question about basic statistics, thanks.
● 问个binomial variance的公式	● 请教：为啥ANOVA需要cosntant variance？
● 学术一个：不完全数据的统计怎么做？	● 一个棘手的问题请教大伙

相关话题的讨论汇总
话题: variance话题: 试件话题: 500话题: test话题: specimen

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天