第7页 - 关于imputation的讨论汇总 - 话题女王

全部话题 - 话题: imputation

s******s
发帖数: 13035

来自主题: Biology版 - 一个GWAS genotype imputation的问题

谢谢

发帖数: 1

大家好，新人前来提问一发。
我是今年快毕业的三年级博士，目前已经陆续发了几篇一作的文章，分都还可以。现在
正准备投另外一篇文章。
本来这篇文章是去年11月份，我心血来潮要做的，因为第一次涉及genetic的东西，我
请教了一个另外组的德国人博后，然后我也很积极的做分析，整理结果，中间和这个博
后讨论了几次，可能因为他觉得我这个做的比较一般吧，所以他没有那么上心，经常推
迟我们的appointment，每次给的建议也很有限，不一次给完，让我做了两次无用功（
SNP筛选）。然后我在开始写文章时，我告诉他会把他放二作，他也默认了。
后来终于搞定了，我写好了初稿，给他审核，本来说要他一周内给我，他拖拖拖一直拖
了3周，说自己有什么更重要的事情。后来给我的意见里，建设性意见也比较有限。最
后我改完了，告诉他我准备投了，正好也跟他说这个方向现在也比较火，然后他就问我
，能不能给他共同第一作者。。。
我其实觉得挺不是滋味的，但是我毕竟是博士，不能对一个博后指手画脚，他老婆还是
我们组的scientist，抬头不见低头见。所以我觉得不如让老板去拒了他，我就说我觉
得还行吧，但是得听我老板的，谁知道我老... 阅读全帖

发帖数: 1

来自主题: Biology版 - 新人提问，这种情况我需要给共同一作嘛？

其实我做的是分子流行病啦，整体就是不用做实验的……数据分析是大头，德国人帮我
们的一个大的队列的genotyping的数据做了imputation，我是用这个队列的数据做了自
己的分析，所以严格意义上，这个基因组的数据是个公用的数据，大家如果用这个数据
，给个coauthor给这个德国人是应该的。只不过就是都说好二作就够了的，他要自己再
想多点credit到最后。
我自己也是做数据分析的，并没有看不起做数据的。生物信息并没有那么简单，我也知
道。

k*******n
发帖数: 2399

来自主题: ChemEng版 - Upstream job opportunity for chemical engineers

All the above discussions are so insightful, and many thanks for your
valuable imputs. Currently I am a postdoctoral researcher in Chemical
Engineering Department. Because of my project, I am working with some people
from petroleum engineering, focusing on the the enhanced oil recovery
utilizing emulsion techniques. Before I started my postdoc one year ago, I
got my Ph.D in physical chemistry from a U.S university. Along with the
cooperation with the PE guys, I have some interest on the reservio

M*****i
发帖数: 103

来自主题: Computation版 - Looking for CS tutoring

Hi all, I am a biostatistics student now trying to write a script (either in
C or C++, I have basic knowledge but it is apparently not enough) for
genotype imputation. I got bunch of obstacles in coding to implement the
algorithm I want. I think my computation proficiency just can not catch the
pace of what I want to do in statistics. So here I sincerely turn to you,
nice and smart CS guys for help.
The tutoring I am looking for is result-oriented, that is, I am
expecting a satisfying script. We

M*****i
发帖数: 103

来自主题: Computation版 - Looking for CS tutoring

x******m
发帖数: 96

来自主题: Computation版 - HELP!!!

Does any talented IT person know how to estimate building square foot based
on the addresses given? Now i have a list of addresses and facility names,
and those addresses can be found in google maps. And we have a free online
resource: https://www.freemaptools.com/area-calculator.htm
By imputing address, you can find the building squarefeet, but you need to
input the address one by one, if is possible we can call the website
directly?
Anyone knows how to resolve it, please call me at 832-794-791... 阅读全帖

m********d
发帖数: 400

来自主题: Economics版 - any bbs for international econ?

I am writting a paper on international econ, and wanna to fina some good bbs
to post some questions. Anyone knows that kind of bbs?
Is anyone interested in the international econ especially using US Import of
merchandise to regress the transport cost? We can communicate.
Sorry for not using Chinese, but my desktop in the office can only imput
English...
Thank you.

i*******e
发帖数: 349

来自主题: Economics版 - 汪洋称，广东5万多家企业倒闭，都是由于生产能力落后。

我不学labour，但是制度和欧洲劳动力市场问题已经有人做过，比如Ljungqvist和
Sargent这篇98年的
JPE文章，引用都已经超过400了。至少我接触过的几个labour economists一般认为欧
洲部分国家的劳工
制度不利于扩大就业。
The European Unemployment Dilemma
Lars Ljungqvist Thomas J. Sargent
University of Chicago and Hoover Institution
Post-World War II European welfare states experienced several decades of
relatively low unemployment, followed by a plague of persistently high
unemployment since the 1980s. We impute the higher unemployment to welfare
states' diminished ability to cope with

a*****g
发帖数: 19398

来自主题: Education版 - zz -- Public Education 2015: The Year in Review [Satire]

The year 2015 was one of exciting developments in the area of K-12 public
education in America. Unfortunately, most of the excitement was of the type
felt while witnessing a multi-car pile-up on the freeway. [See website]
Several of the worst crashes are detailed below.
January
Record-breaking snow fall in New England prompt school officials to order
snow shoes for all elementary school pupils so as not to miss a single day
of learning, which would render the children unready for career and coll... 阅读全帖

a****l
发帖数: 8211

来自主题: EE版 - 请教，为何打电话回国没delay？按光速走也得0.05秒啊

其实当年ip电话刚开始的时候的确是延迟非常严重,也就是近几年才随着网络速度的提
升通话效果才好的.网快了,不丢包了,处理就迅速了,也不用什么imputation(我怀疑是
interpolation)了.

a****l
发帖数: 8211

来自主题: EE版 - 请教，为何打电话回国没delay？按光速走也得0.05秒啊

g*****l
发帖数: 424

来自主题: Mathematics版 - 数据科学之江湖兵器谱 (转载)

【以下文字转载自 DataSciences 讨论区】
发信人: greatel (灵致), 信区: DataSciences
标题: 数据科学之江湖兵器谱
发信站: BBS 未名空间站 (Sun Oct 9 16:57:10 2016, 美东)
【注】原发于微信公众号：data_wisdom
数据江湖，风起云涌。各路英豪，群雄逐鹿。
这是一个数据科学最好的时代，也是数据江湖最乱的时代。
那么在这么一个特殊的江湖里面浪，有什么兵器是值得我们去关注的呢？这篇文章列举
了一些常用方法（刀剑），并不涵盖工具与平台。就先让我们一起去看看这个排名不分
先后左右的兵器谱。
数据科学家Vincent Granville博士发表博文列举了数据科学家常用的45种技术。这是
个很适合初学者去逐个了解的列表。当然，这并不代表数据科学（统计学）的全部。虽
然他并没有提出自己的详细总结，但是有志于学习数据科学的同学不妨初步有个印象，
有不太熟悉的topic可以进一步去了解一下。另外我在后面也补充了我认为也值得学习
的领域，很多人都会在日常的数据实践中用到。多学有益于身心健康。
首先需要说明的是，这些技术... 阅读全帖

m*******d
发帖数: 17

来自主题: MedicalCareer版 - visiting student 的经历对 match 有用么？

Thanks for the imputs!
那么有谁有成功申请visiting student的经历吗？能分享一下吗？谢谢

z***u
发帖数: 81

来自主题: MedicalCareer版 - A very Naive Question: Fellowship after the Family Medicine

I would like to ask about the possible fellowship after Family medicine. Is
it true that people graduated from FM can do pretty much the same fellowship
as Internal Med? But a lot more difficult of course. I am pretty interested
in the hemo/oncology. Thanks for all imput.

p****u
发帖数: 10

来自主题: MedicalCareer版 - CS 考试当天的问题

I would like to ask, if you type the patient note instead of hand writing,
is all of your information (name, candiate #, patient #, USMLE #) imput to
the computer outside the room of the patient you just visit? Do you have to
input it yourself everytime? I am sometimes not confident in remembering my
USMLE #. Thanks.

A*****s
发帖数: 813

来自主题: Pharmaceutical版 - 列位看官，打发点吧！

nice imput tho...hehe
bravo, dude

z****l
发帖数: 97

来自主题: Pharmaceutical版 - 请问Sustained-Release和Controlled-Release的区别

Can't imput chinese.
Sustained release is just slow release.
Controlled release means the drug release rate is constant. It has a
specific, regulated rate for a prolonged period.

A*********u
发帖数: 8976

来自主题: Pharmaceutical版 - BLQ in semi-log graph

作semi-log concentration vs time PK graph的时候
BLQ 怎么处理？
单个病人作图的时候，BLQ算作0的话画不出来，等于从semi-log图里excluded了，我觉
得如果知道Low limit of quantification的话，用那个值来impute BLQ也是一个选择
，有没有这样做的。
作平均图的时候，全是BLQ(pre-dose, 或者48小时)excluded, 其他点算0（否则mean会
和table里的不一样），这样区分处理有问题吗。
惯常的做法是什么？
多谢多谢！

e******e
发帖数: 410

来自主题: Psychology版 - 求教confirmatory factor analysis的问题 (转载)

1. CFI 好像现在报的比较少，RESEA是最重要的。
2. 感觉你在用SEM的方式来做CFA。如果是这样的话你可以把那三个crossloaded的indi
cators给连在一起，这样一般可以把fit提上去。
3. Missing Data如果是listwise的话，试试EM imputation。

cross
chi-

A********a
发帖数: 133

来自主题: Quant版 - 怎么backfill空白数据?

u can Expectation-Maximization, for example, use ecmmvnrmle in MatLab
financial toolbox, see details in Regression with Missing Data section.
there are other imputation methods besides EM algo, i would like to know

p********a
发帖数: 5352

来自主题: Statistics版 - [合集] how many ways can we deal with missing values?

☆─────────────────────────────────────☆
retrome (me) 于 (Thu Mar 6 11:41:54 2008) 提到:
replace with mean
and else?
thanks
☆─────────────────────────────────────☆
shac (本赛季战况，0：3) 于 (Thu Mar 6 11:43:58 2008) 提到:
Monte Carlo simulation

☆─────────────────────────────────────☆
yueryue (yueryue) 于 (Thu Mar 6 12:19:32 2008) 提到:
1) imputation
2) weighting
☆─────────────────────────────────────☆
liuyali (dilly) 于 (Thu Mar 13 21:08:47 2008) 提到:
mean replacement is the most inefficie

h***m
发帖数: 85

来自主题: Statistics版 - about Proc MI

When I use proc MI and specify the maximum and minimum values for some
variables. It always came out an ERROR: An imputed variable value is not in
the specified range after 100 tries.
Do you have such experience? And how do you solve such problem?
Thanks!

h***m
发帖数: 85

来自主题: Statistics版 - about Proc MI

y****1
发帖数: 400

来自主题: Statistics版 - 求助：SAS data set输出

不是SAS的关系吗？
log里面是这样写的：
NOTE: 7704 records were written to the file 'C:\Documents and Settings\User\
My
Documents\Imputation paper\Chr21.ped'.
The minimum record length was 3.
The maximum record length was 256.
NOTE: There were 24 observations read from the data set IMP.PEDFILE.
所以是输入24行但是输出7704个record

y****1
发帖数: 400

来自主题: Statistics版 - 求助：SAS data set输出

新尝试如下：
proc export data=Imp.pedfile outfile='C:\Documents and Settings\User\My
Documents\Imputation paper\Chr21ped.txt' dbms=csv;
run;
这个是work的，可是非常让我不解的是我明明有5+16179=16184列，从第6列开始一直到
最后一列的这
16179列中，每一列都有2个数字（1或2），我成功输入到txt文件中后也是24行，可是
我把第一行粘到word
中word count的时候就只有16383个character了？？？为什么呢？？？
我现在非常非常的confuse，我都不知道16383这个数字是怎么来的。难道txt文档每行
的长度有限制？
最后还是多谢大家的关注！

o******6
发帖数: 538

来自主题: Statistics版 - [合集] Missing data

☆─────────────────────────────────────☆
bluesky11790 (天蓝) 于 (Sat May 17 17:40:07 2008) 提到:
If missing data will not be imputed, then I can simply delete the missing
data, am I correct?
Thanks!!
☆─────────────────────────────────────☆
try2009 (try) 于 (Sat May 17 19:50:34 2008) 提到:
I think if the missing data doesn't take up too much of the whole dataset(
say, <1-5%), you may do this. It would be fine.
But if it takes up too much of the whole, you need to take this seriously.
Some other

o******6
发帖数: 538

来自主题: Statistics版 - proc logistic遇到missing value怎么处理

我以为你想把那些数据丢掉呢，你的意思是想IMPUTE MISSING DATA？

o******6
发帖数: 538

来自主题: Statistics版 - proc logistic遇到missing value怎么处理

简单地说一下吧，你得先决定MISSING DATA PATTERN，是MCAR,MAR还是NI，当然GENDER
肯定是MCAR啦，举个例子，不是针对你的例子，你可以用LOGISTIC REGRESSION MODEL
来决定那个MISSING DATA是否DEPEND ON X1,X2...
SAS的话PROC MI可以IMPUTE MISSING DATA；
还有一个办法，就是你只用COMPLETE DATA，用BOOTSTRAPPING方法分析

t*********e
发帖数: 313

来自主题: Statistics版 - How to get summary statistics from multiple imputed data sets

By sample size, i mean the total count by subgroups in a two-way table.
Thanks!

t*********e
发帖数: 313

来自主题: Statistics版 - How to get summary statistics from multiple imputed data sets

i actually ended with randomly selecting one data set for the univariate
table. Any better idea is very welceome

s*******y
发帖数: 2977

来自主题: Statistics版 - 有个bootstrap的问题想找人讨论下。

是可以这么做了，可是这样的CI能有多reliable呢？你还是在用一个很小的pool，只不
过反复反复抽取,相当于你假设你的population跟这个小样本是相似的，然后做了
imputation.

s*******r
发帖数: 181

来自主题: Statistics版 - 如何plot巨大data set的heatmap

Gene-gene interaction of imputed genome-wide data?
I have used R for heatmap, but definitely not good for 2mx2m. Split by
chromsome x chromosome may work.

s*******r
发帖数: 181

来自主题: Statistics版 - SAS 的MISSING DATA 怎么才能正常使用

If missing data mainly come from one or two not so important variables, then
delete those variables.
If all variables have to be kept, use imputation?
Otherwise, depending on the analysis, google the right approach.

z**********i
发帖数: 12276

来自主题: Statistics版 - 求 imputation 后出来的iteration 的数据作用

帮顶，我也想用sas做这个，还没有头绪。

k****4
发帖数: 86

来自主题: Statistics版 - 求 imputation 后出来的iteration 的数据作用

Almost the same like proc MI in SAS. You may refer to SAS document about
that procedure, then you would get the idea.

h*******d
发帖数: 272

来自主题: Statistics版 - 求 imputation 后出来的iteration 的数据作用

谢谢您的回复
但。。还是很迷糊
比如我对新生成的5组数据+原始数据分别作BEST SUBSET or STEPWISE regression 分
析得到6组结果
然后看哪些变量最常常被这6组结果选中就取出这些变量做我的SUBSET PREDICTOR 然
后我想做REGRESSION MODEL.
但是然后怎样呢？我虽然知道了 SUBSET 但我还是要面对6组数据啊
到底最后怎样得到我的regression model 呢

a***r
发帖数: 420

来自主题: Statistics版 - [求教]文献里的一句话，关于MCMC的

我第一年，在做rotation，所以大家可以看到我的求助帖种类比较多样。。。
不过这个rotation的内容确实是genotype imputation和haplotype phasing：）

f**l
发帖数: 4396

来自主题: Statistics版 - 问个missing data的问题，关于time series data

如果要做multiple simulation, 但是time series 长度不一样，也就是开始的时间不
一样，应该怎么做呢？如果discard多出来的data, 会造成信息丢失；如果imputate
missing values, 是不是需要假设一个distribution，有必要这么做吗？

s******h
发帖数: 539

来自主题: Statistics版 - 问个logistic regression的问题。

用Proc MI + Proc logistc(w/by imputation option) + proc mianalyze

d*******o
发帖数: 493

来自主题: Statistics版 - 问一个有关marketing的统计问题

(1)Logistic regression: I bet the dataset is heterogeneous(categorical +
numerous) and many values are missing. Have to do a lot of variable
aggregation, transformation and imputation. Amazingly still many ppl in
direct marketing are using it. Simply don't know why.
(2)Decision tree: CART/CHAID/C5.0/boosting and bagging
(3)SVM: way too slow. To run 50k+50K would kill most machines.
(4)Neural network: hard to explain.
Conclusion: the final predicative model would be an assemble of multiple
decis

d********t
发帖数: 837

来自主题: Statistics版 - longitudinal的data，一般都用什么方法分析？

There are many more , the most popular choice for continuous endpoints
is based on fully specified likelihood and covariance modeling,
possibly with random effects. GEE is often used for categorical
data,but almost never seen with continuous data in practice (maybe
gamma family?). As the usual interest in a clinical trial is for the
treatment effect at a certain timepoint, longitudinal models are mostly
used to account for missing data over time (the only exception I've
seen is to estimate coeff... 阅读全帖

d******g
发帖数: 130

来自主题: Statistics版 - longitudinal的data，missing data 一般都怎么处理？

You may want to try last-value-carried-forward or perform multiple
imputation using SAS Proc MI

a****m
发帖数: 693

来自主题: Statistics版 - 请教一个sas问题

proc mi data=FitMiss noprint out=outmi seed=37851;
MIANALYZE procedure do imputation using non missing observation.,

c**********5
发帖数: 653

来自主题: Statistics版 - imputation question?thanks

ding

c**********5
发帖数: 653

来自主题: Statistics版 - imputation question?thanks

Hi,Thanks a lot.
I fotgot R for a while and I maybe can pick it up.I will study your code
tonight.I am not authrized to install R to my working station.
I know how to write the SAS code using Proc Mi(2 steps).
I am stuggling for the questions above.

d******g
发帖数: 130

来自主题: Statistics版 - imputation question?thanks

Not sure if you have read the good post on UCLA's SAS page on this topic.
Here is the link:
http://www.ats.ucla.edu/stat/sas/seminars/missing_data/part1.htm
Hope this helps.

the
data

d******g
发帖数: 130

来自主题: Statistics版 - R matlab SAS谁快？

Are you referring to the "memory" syntax for allocating the upper limit for
the memory? Any idea? I found it a big issue when I was doing some
imputation for large dimensional data and EM.

A*******s
发帖数: 3942

来自主题: Statistics版 - 如何处理这样的missing value？

need to first specify ur research purposes(prediction/design/hypothesis test
?) and detect the missing patterns(MCAR/MAR/MNAR), and then decide what
imputation method.

d******o
发帖数: 59

来自主题: Statistics版 - 如何处理这样的missing value？

我们现在都是用imputation method.
你可以做这方面的research.
用mean是比较简单的方法
我们比较喜欢的是用prediction model来做。
例如，你可以用已知等待时间的那些data,dependent variable 是等待时间，
independent可以是你data里有的信息，gender, age, race, income, consumer
behavior variables and so on.
用这个model，predict那些等待时间未知，但其他信息已知的数据

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天