q********n 发帖数: 355 | 1 一个典型的样本数量问题,想讨论一下。统计全省的某一电力设备的健康状况。
比如全省有1万台,历史数据表明合格率为80%。要确定至少需要检测多少台才能够代表
全体,有个经典的公式如下
n=(z^2)p(1-p))/(e^2+(z^2)p(1-p))/N)=(1.96^2*0.8*0.2)/(0.02^2+(1.96^2*0.8*0.2
)/10000)=1332
即检测1332台就可以保证95%置信区间精确度80±2%。
现在的问题是,想知道是否有必要在南北两个区内单独抽样,是否需要在每个市单独抽
样。个人觉得这要根据历史数据先计算每个区和每个市的合格率。如果和总体合格率相
同,就没有必要。但是这在统计上有没有什么相关的理论和方法呢,谢谢! | g******2 发帖数: 234 | 2 It's the stratified sampling in variance reduction. If a function is
homogeneous across all region, then there's not need to do stratified
sampling. If a function is heterogeneous across regions but homogeneous
within sub-regions, you'll gain efficiency by stratified sampling.
For example, if you have 2 regions and each has 5000 machines. Suppose the
defective% for region 1 is 0.1 and 0.3 for region 2. Then in this case
sampling n samples equally from region 1 and region 2 will be more efficient
than randomly sampling 2n samples from all regions. The ratio of the
variance should be ((0.9*0.1+0.7*0.3)/2) / (0.8*0.2) = 15/16, which means
you'll need 1/16 less samples. | q********n 发帖数: 355 | 3 thanks
efficient
【在 g******2 的大作中提到】 : It's the stratified sampling in variance reduction. If a function is : homogeneous across all region, then there's not need to do stratified : sampling. If a function is heterogeneous across regions but homogeneous : within sub-regions, you'll gain efficiency by stratified sampling. : For example, if you have 2 regions and each has 5000 machines. Suppose the : defective% for region 1 is 0.1 and 0.3 for region 2. Then in this case : sampling n samples equally from region 1 and region 2 will be more efficient : than randomly sampling 2n samples from all regions. The ratio of the : variance should be ((0.9*0.1+0.7*0.3)/2) / (0.8*0.2) = 15/16, which means : you'll need 1/16 less samples.
|
|