regression prediction问题 - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - regression prediction问题

相关主题
● 如何在一个regression model里面同时处理continuous和categorical变量	● 问一个和regression analysis有关的问题
● 接着问统计问题（有包子答谢）	● 问个multicollinearity 的问题，inSAS - fresh水平，谢谢帮助！
● Correlated independent variable	● 菜鸟请教一个问题
● any regression model with high prediction accuracy?	● 求教如何出来很大的数据并且有很多variable
● 请教logistic regression的independent variable是categorical	● Clustering analysis with categorical variables
● missing data questions	● 也问个模型
● a quesiton about random effect	● 关于multiple imputation和variable selection的问题
● 一个covariance的问题	● How to code categorical time-varying covariates in Cox mod

相关话题的讨论汇总
话题: variable话题: prediction话题: regression话题: model

进入Statistics版参与讨论

1

(共1页)

l******n 发帖数: 9344	1 regression training data set里面，有个categorical variable只有３个level 需要prediction的data里面有一个data，这个categorical variable的值不再这３个 level里面，怎么做prediction? 谢谢
h***x 发帖数: 586	2 Two ways, 1) one way which is the safest method is not using this categorical variable . :-) 2) the other way is building model using training dataset as it is, if the variable(indicator) is significant, include it. When you apply the model to the new data you mentioned, the indicator is 0 and will not affect predictive results. just my 2 cents, 【在 l******n 的大作中提到】 : regression training data set里面，有个categorical variable只有３个level : 需要prediction的data里面有一个data，这个categorical variable的值不再这３个 : level里面，怎么做prediction? : 谢谢
l******n 发帖数: 9344	3 Thanks, huxxx both methods may sense. variable to 【在 h***x 的大作中提到】 : Two ways, : 1) one way which is the safest method is not using this categorical variable : . :-) : 2) the other way is building model using training dataset as it is, if the : variable(indicator) is significant, include it. When you apply the model to : the new data you mentioned, the indicator is 0 and will not affect : predictive results. : just my 2 cents,
A*******s 发帖数: 3942	4 is there parameterization problem in the second one? i think it treats the unknown category as the reference category. Not sure if it is valid if no intercept in the model. not sure if this would work--1st step, fit the model with that categorical variable and other covariates; 2nd step, fit the model without the categorical one and fix the coeffs of other covariates in order to find the intercept estimate. or, treat the categorical variable as a random effect. two methods should have very close result if sample size in each category is large. variable to 【在 h***x 的大作中提到】 : Two ways, : 1) one way which is the safest method is not using this categorical variable : . :-) : 2) the other way is building model using training dataset as it is, if the : variable(indicator) is significant, include it. When you apply the model to : the new data you mentioned, the indicator is 0 and will not affect : predictive results. : just my 2 cents,
h***x 发帖数: 586	5 其实我觉得最好的方法就是先检查两个dataset这个变量的分布，看是不是由于编码错误导致编码不一样。如果确实不一样，那么这个变量就不应该用。也就没有必要检测包括这个变量和不包括这个变量的区别了。至于第二种方法，是基于training dataset的最优解，在具体model deployment的时候，我们不知道data的分布会是怎样,但我们的假设就是就是要预测的data和training的 data有相似的分布。就这个具体例子看，model scores会变一点，但score ranking不怎么会变,so final scoring results should be the same. I think you are right, we can treat the categorical variable as a random effect ... the 【在 A*******s 的大作中提到】 : is there parameterization problem in the second one? i think it treats the : unknown category as the reference category. Not sure if it is valid if no : intercept in the model. : not sure if this would work--1st step, fit the model with that categorical : variable and other covariates; 2nd step, fit the model without the : categorical one and fix the coeffs of other covariates in order to find the : intercept estimate. : or, treat the categorical variable as a random effect. two methods should : have very close result if sample size in each category is large. :

1

(共1页)

进入Statistics版参与讨论

相关主题
● How to code categorical time-varying covariates in Cox mod	● 请教logistic regression的independent variable是categorical
● 请大家帮我看看应该用哪个model分析？GEE 还是cox regression with time varing covariate？	● missing data questions
● 求教一个sas读data的问题	● a quesiton about random effect
● SAS data merge求助	● 一个covariance的问题
● 如何在一个regression model里面同时处理continuous和categorical变量	● 问一个和regression analysis有关的问题
● 接着问统计问题（有包子答谢）	● 问个multicollinearity 的问题，inSAS - fresh水平，谢谢帮助！
● Correlated independent variable	● 菜鸟请教一个问题
● any regression model with high prediction accuracy?	● 求教如何出来很大的数据并且有很多variable

相关话题的讨论汇总
话题: variable话题: prediction话题: regression话题: model

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)