c**********5 发帖数: 653 | 1 Hi,Everyone,
I am new with this topic.Can anybody help me out?
in the pilot study there were around 100 sample size ,almost half of the
them carry missing value.
I would like to use the multiple imputation to deal with the missing data
problem.
The current model is :
Outcome1(post measurement1-premeasure1)=pre measurement1+group
Outcome2(post measurement2-premeasure2)=pre measurement2+group
…….
There are a lot outcomes We are interested.
I have the following question:
1. How can I build the imputation model? Which variables should I include
in the imputation model in my case?( dependent variable and independent
variable...and others..)
Notes: Missing data are not only within the outcome but also in the
independent variable(very small portion) here
2. how many imputation times do you recomended?(usually,5-10,however,if the
proportion of the missing value is huge,maybe we need more imputation times
(50))???
Thanks. | c**********5 发帖数: 653 | | w******a 发帖数: 25 | 3 Here is an R example to impute one missing data in each record,half of the code is to make data sample, you probably only need second half,but including them here helps you understand what is going on:
The data will look like
col1 col2
x
x x
x
x x
x x
...
library(Rlab)
alp = 1
Prob_R1 = 0.5
Prob_R0 = 1 - Prob_R1
len_Y1 = 200
K_delta = 2
Y1 = rnorm(len_Y1,mean=0,sd=1)
R1 = rbinom(n=len_Y1, size=1, prob=Prob_R1)
Y2 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
Y2[R1==0] = NA
data = data.frame(cbind(Y1,Y2))
reg = glm(Y2~Y1,family=gaussian,data)
sigma = sd(reg$residuals)
delta_grid = K_delta * (-2:2/2) # interval from -K_
delta to K_delta
delta = sigma * delta_grid # interval from -K*
sigma to K*sigma
E_Y2 = NULL
for(i in 1:length(delta))
{
Y2[R1==0] = NA
Y2.pred = delta[i] + predict(reg,newdata=data)
Y2[R1==0] = 0
Y2.hat = Y2*R1 + Y2.pred*(1-R1)
par(mfrow=c(1,2))
plot(Y1[R1==1],Y2[R1==1])
points(Y1[R1==0],Y2.hat[R1==0],pch="+")
hist(Y2.hat, xlim=c(-4,4))
E_Y2[i] = mean(Y2.hat)
}
par(mfrow=c(1,1))
plot(delta,E_Y2)
#lm(formula = E_Y2 ~ delta) E_Y2=0.06531+0.54500*delta | w******a 发帖数: 25 | 4 Here is an R example to impute one or two missing data in each record:
The data will look like
col1 col2 col3
x
x x x
x x
x x
x x x
x
x x x
...
library(Rlab)
alp = 1
K_delta = 2
len_Y1 = 200
#Sample setting:
#Measurment N_
patient Percent
# 1 12
0.18
# 1 2 4
0.05
# 1 2 3 22
0.78
#Convert the above info into missing rate:
#N_measurement 1
2 3
#Occupy_rate 0.78+0
.05+0.18 0.78+0.05 0.78
#Missing_rate 1-(0.
78+0.05+0.18) 1-(0.78+0.05) 1-0.78
#missing rate for each measurement at time points 1,2,3
Prob_R1 = 0
Prob_R2 = 1-0.78-0.05
Prob_R3 = 1-0.78
#measurements at time points 1,2,3
Y1 = rnorm(n=len_Y1, mean=0,sd=1)
Y2 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
# mean(Y2)=-0.03, sum(Y1)/200=0.024
Y3 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
#R:response indicator 1=observed;0=missing
R1 = rep(1,len_Y1)
R2 = rbinom(n=len_Y1, size=1, prob=1-Prob_R2)
R3 = rbinom(n=len_Y1, size=1, prob=1-(Prob_R3-Prob_R2))
Y2[R2==0] = NA
R3[R2==0] = 0
Y3[R3==0] = NA
data = data.frame(cbind(Y1,Y2,Y3,R1,R2,R3))
#Estimate Y2
reg = glm(Y2~Y1,family=gaussian,data)
sigma = sd(reg$residuals)
delta_grid = K_delta * (-2:2/2) # interval
from -K_delta to K_delta
delta = sigma * delta_grid # interval
from -K*sigma to K*sigma
E_Y2 = NULL
par(mfrow=c(4,3))
for(i in 1:length(delta))
{
Y2[R2==0] = NA
Y2.pred = delta[i] + predict(reg,newdata=data)
Y2[R2==0] = 0
Y2.hat = Y2*R2 + Y2.pred*(1-R2)
#par(mfrow=c(1,2))
plot(Y1[R2==1],Y2[R2==1])
points(Y1[R2==0],Y2.hat[R2==0],pch="+",col="red")
hist(Y2.hat, xlim=c(-4,4))
E_Y2[i] = mean(Y2.hat)
}
par(mfrow=c(1,1))
plot(delta,E_Y2)
#Estimate Y3
reg2 = glm(Y3~Y1+Y2.hat,family=gaussian,data)
sigma2 = sd(reg2$residuals)
delta_grid = K_delta * (-2:2/2) # interval
from -K_delta to K_delta
delta = sigma2 * delta_grid # interval
from -K*sigma to K*sigma
E_Y3 = NULL
par(mfrow=c(4,5))
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)
plot(Y1[R3==1],Y3[R3==1])
points(Y1[R3==0],Y3.hat[R3==0],pch="+",col="red")
}
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)
plot(Y2.hat[R3==1],Y3[R3==1])
points(Y2.hat[R3==0],Y3.hat[R3==0],pch="+",col="red")
}
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)
hist(Y3.hat, xlim=c(-4,4))
E_Y3[i] = mean(Y3.hat)
} | c**********5 发帖数: 653 | 5 Hi,Thanks a lot.
I fotgot R for a while and I maybe can pick it up.I will study your code
tonight.I am not authrized to install R to my working station.
I know how to write the SAS code using Proc Mi(2 steps).
I am stuggling for the questions above. | d******g 发帖数: 130 | 6 Not sure if you have read the good post on UCLA's SAS page on this topic.
Here is the link:
http://www.ats.ucla.edu/stat/sas/seminars/missing_data/part1.htm
Hope this helps.
the
data
【在 c**********5 的大作中提到】 : Hi,Everyone, : I am new with this topic.Can anybody help me out? : in the pilot study there were around 100 sample size ,almost half of the : them carry missing value. : I would like to use the multiple imputation to deal with the missing data : problem. : The current model is : : Outcome1(post measurement1-premeasure1)=pre measurement1+group : Outcome2(post measurement2-premeasure2)=pre measurement2+group : …….
| c**********5 发帖数: 653 | 7 Hi,
Thanks.I have read it and it is my favorite web.不过还是好谢谢你。
我从来没有用过这个方法,读完一些资料以后,感觉是如果是任意missing模式,当
我们建立imputation model时,我们可以将所有与你感兴趣的变量放入这个model,不
管是dependent variable 还是indpendent variable。不知我理解的对不对。谢谢 | H******r 发帖数: 2879 | 8 Almost all existing imputation methods are based on MAR assumption - think
about whether this assumption is true in your problem.
Imputation model could be a "big" model, which includes all "useful"
predictors and some "useless" predictors. 10 multiply-imputed datasets
should be enough.
You may check IVEware for MI - it works for non-normal model and you can
specify bounds as well. |
|