s******e 发帖数: 841 | 1 I have a dataset with 1 response variable and 20 predictor variables (
continuous and categorical). The sample size is around 3000. The the result
of multiple regression methods is poor (with R2 less than 0.2). I have tried
regression tree method, but I can not even form a tree with the dataset (I
mean the number of terminal node is only one).
Is there any other method that I can try to get a good fit?
Maybe I can try to do the transformation with some of the predictors, but
how can I find the b | s*****n 发帖数: 2174 | 2 你的一些基本概念有些混淆.
a good (fit) model 和 prediction accuracy 没什么直接关系.
model只是用来 fit observed data. 然后以一定的标准(比如最小平方和)来评价好坏.
至于prediction, 这严重依赖于你做predict的时候的assumption和你data本身的性质
. 你的data本身noise term就很大, 也许无论如何你都无法精确predict. 你再怎么找
model也没用.
很小的R^2并不一定说明 model 不好, 或者是存在更好的 model. | s******e 发帖数: 841 | 3 Thank you for replying.
I am not a stastics major. It is an engineering problem. I think first I
want to reduce the prediction error as much as possible. That's why I wanted
to try regression tree method. But it failed. My question is can I find a
method that can give me small prediction error and it does not matter if it
is hard to interprete the result.
坏.
【在 s*****n 的大作中提到】 : 你的一些基本概念有些混淆. : a good (fit) model 和 prediction accuracy 没什么直接关系. : model只是用来 fit observed data. 然后以一定的标准(比如最小平方和)来评价好坏. : 至于prediction, 这严重依赖于你做predict的时候的assumption和你data本身的性质 : . 你的data本身noise term就很大, 也许无论如何你都无法精确predict. 你再怎么找 : model也没用. : 很小的R^2并不一定说明 model 不好, 或者是存在更好的 model.
| s*****n 发帖数: 2174 | 4 我不觉得有什么放之四海皆准的程序可以使你降低prediction error.
你唯一能做的, 就是尝试不同的variable selection, 尝试不同的transforation.
如果你的reponse是近似normal的, 尽量把你所有的predictor都往normal上面
transform. 如果response非常skew, 你首先要把response变得近似normal了, 至少也
要比较symmetric了.
还有一点就是, 我不知道你是如何选model和评价prediction的. 如果你没有用cross-
validation的话, 最好用这个标准. 或者是用AIC做标准也一样, 理论上, AIC 是试图
minimize prediction error 的. 仅仅看 R^2 这些来试图找到predictive model 肯定
是不行的. | s******e 发帖数: 841 | 5 it's too broad, can you specify one? | s*********e 发帖数: 1051 | 6 with neural networks, you can get a perfect fit, probably over-fit. ^_^
result
tried
I
【在 s******e 的大作中提到】 : I have a dataset with 1 response variable and 20 predictor variables ( : continuous and categorical). The sample size is around 3000. The the result : of multiple regression methods is poor (with R2 less than 0.2). I have tried : regression tree method, but I can not even form a tree with the dataset (I : mean the number of terminal node is only one). : Is there any other method that I can try to get a good fit? : Maybe I can try to do the transformation with some of the predictors, but : how can I find the b
|
|