on site被拒，讨论一下致命的一个问题吧 - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - on site被拒，讨论一下致命的一个问题吧

相关主题
● 怎么样才能提高R SQUARE的值--using SPSS?	● 请教， linear regression model问题
● 如何确定什么情况time series，什么情况linear reg？	● ordinary linear regression assume数据是Normal distribution么？
● ##面试过了，问题求解答##	● Correlation between 2 variables
● 已知两组数据,x,y 要找出function, f(x)=y	● polynomial 和linear
● R有哪个比较好的做nonparametric regression的package？	● 问一个linear regression 的弱问题。
● 问一个关于linear regression的error假设问题	● 很desperate，求问生物统计牛人一些interview技术问题。。。。
● Linear regression model 问题请教	● Question about the assumption of OLS (Ordinary Least Squares) regression?
● 请教LINEAR REGRESSION基本问题	● matlab里怎么做residual是自回归的regression? (转载)

相关话题的讨论汇总
话题: income话题: linear话题: variable话题: use

进入Statistics版参与讨论

(共1页)

b****w
发帖数: 71

前两天跑去外州on site,结果悲剧的回来了，今天收了拒信，没有说任何原因。非常失
落。想来自己的专业知识不是很扎实啊，惭愧惭愧，要继续努力了。在这里请教一个on
site被问的问题。
建模问题，DV是loan,1-performing loan,0-non performing loan.然后有一个IV是
income.这个很基本啦，然后我说可以建logistic regression model。然后对方一直问
，除了直接把income放到模型里，还可以用什么方法。。我就不知道了，我从来都是直
接放进去的。真心请教大家啊。

k****i
发帖数: 3282

make categories based on income? 是这个意思么？

on

【在 b****w 的大作中提到】

: 前两天跑去外州on site,结果悲剧的回来了，今天收了拒信，没有说任何原因。非常失
: 落。想来自己的专业知识不是很扎实啊，惭愧惭愧，要继续努力了。在这里请教一个on
: site被问的问题。
: 建模问题，DV是loan,1-performing loan,0-non performing loan.然后有一个IV是
: income.这个很基本啦，然后我说可以建logistic regression model。然后对方一直问
: ，除了直接把income放到模型里，还可以用什么方法。。我就不知道了，我从来都是直
: 接放进去的。真心请教大家啊。

k*******a
发帖数: 772

transformation such as log?

b****w
发帖数: 71

嗯，我提了这个方法，我说按照income分类，比如分成5类，这样就多出4个dummy
variable,然后他就问我这种方法跟直接用income的区别是什么。。。我想了半天都不
知道。。我说如果分类的话就能知道每一类跟DV的关系，然后按关系不同再来看如何处
理。但显然不是面试官想要的回答，后来憋了好久，他提示我从distribution啊，
assumption啊之类的差别上看看，any idea???

【在 k****i 的大作中提到】

: make categories based on income? 是这个意思么？
:
: on

b****w
发帖数: 71

嗯，这个我也提了，不过我说的是用income^2.但我说不清楚为什么要这么做，也不知
道这么做的好处在哪里，能具体说说么？多谢了

【在 k*******a 的大作中提到】

: transformation such as log?

m******o
发帖数: 23

把income分成不同的class

t*******t
发帖数: 633

如果直接用income那就是linear的了。
分四个dummy可以避免linear trend，也可以further做spline。

【在 b****w 的大作中提到】

: 嗯，我提了这个方法，我说按照income分类，比如分成5类，这样就多出4个dummy
: variable,然后他就问我这种方法跟直接用income的区别是什么。。。我想了半天都不
: 知道。。我说如果分类的话就能知道每一类跟DV的关系，然后按关系不同再来看如何处
: 理。但显然不是面试官想要的回答，后来憋了好久，他提示我从distribution啊，
: assumption啊之类的差别上看看，any idea???

w********m
发帖数: 1137

Situation: build a logistic regression by income
Action: log transformation of income
Result: reduce heteroskedasticity(keep OLS assumption hold)

s*******e
发帖数: 1385

这个你要先做一个univariate plot，然后再决定cap， floor和transformation，还有
missing impute。

on

【在 b****w 的大作中提到】

s*******e
发帖数: 1385

恩，直接用income就是assume linear，分成4个dummy variable就没有linear的
assumption，但是也少了3个DF

【在 t*******t 的大作中提到】

: 如果直接用income那就是linear的了。
: 分四个dummy可以避免linear trend，也可以further做spline。

相关主题
● 问一个关于linear regression的error假设问题	● 请教， linear regression model问题
● Linear regression model 问题请教	● ordinary linear regression assume数据是Normal distribution么？
● 请教LINEAR REGRESSION基本问题	● Correlation between 2 variables
进入Statistics版参与讨论

s*********e
发帖数: 1051

google "weight of evidence" transformation.

on

【在 b****w 的大作中提到】

z******n
发帖数: 397

我对walkrandom说的略有异议。异方差性是对OLS而言的，可以通过诊断后对y或x进行
适当变换做一定程度的消除，如对y做cox-box变换，或者引入x的高阶项。但异方差性
在logistic regression里叫做over-dispersion，处理方法和OLS是不一样的。发生问
题的基本原因是每个x-group里面y的方差与模型假设var(y)=ni*pi*(1-pi)有偏离。校
正一个因子var(y)=a*ni*pi*(1-pi)有时可以改善这一点。理论表明这种校正方法要求
ni>1，所以income分组后，使得每一组的样本量ni>1，这种校正方法就有可能生效。如
果income为连续值，则有可能失效。这种校正方法标准的软件里都有实现。其他的校正
方法也许可以避免x的离散化（如随机效应模型），但很少有标准软件提供相关选项。

【在 b****w 的大作中提到】

z******n
发帖数: 397

多了3个df吧？

【在 s*******e 的大作中提到】

: 恩，直接用income就是assume linear，分成4个dummy variable就没有linear的
: assumption，但是也少了3个DF

d*****s
发帖数: 1407

Degee of freedom 你们都想太多了
一个是linear的问题
一个是outlier的问题
还有就是很少有把income直接放在model里的，基本是考虑debt to income ratio
还有就是model的目的，要是scorecore，基本都要分组的

k****n
发帖数: 165

Doing so means on average every more $1 in income will increase the odds of
performing loan by XXX.
However, this will never be the case. It seems straighforward, at least to
me, the higher the income, the less effective the variable income would be.
Discretizing the income variable is an option, so is log-transformation.
Both loose the strong parametric assumption of my first argument. Or you
could simply go non-parametric routine, fit a smooth coefficient model with
regard to income, for example.
The mean probability looks like:
Pr(performing loan) = X \beta + \beta_q * f(income)

on

【在 b****w 的大作中提到】

h******n
发帖数: 190

Okay, I think the question is about how do you treat a continues variable in
your model ?
First of all, you need plot to see the distribution of this variable, and
another plot to see its relationship with outcome - with this plot, you may
want to try use logit(Y) as Y-axis in addition to a binary Y.
Then, based on your plots, you may decide the way you want to use it - you
can use it as lienar, as with this way, it might be better to be
centered, or be devided by a unit such as 10 or 100 or 1000 (since dolloars is
a too small unit), or transform it (log, square root, squared) - depending on its relations
with Y. This is the most efficient way to use a linear variable.
You can also dichotomize it, use it as binary variable - you need to be
caution on this method since it not only hurts the precision but also subjects to
the risks of the underlying assumption;
You can also categorize it. As with this way, you may first consider trying
percentiles, or quartiles, to see how the relatioships are.
You can also try linear spline, if you believe the relationship is not
linear, or even more, try squared (ie, b1*X+b2*X^2), or linear spline +
squared.
There are so many ways to use a linear variable in a model, and you need to justifiy the way you use it. Once your
knoweledge prepared you well enough to talk about it in so far, I believe you should get
the offer !

(共1页)

进入Statistics版参与讨论

相关主题
● matlab里怎么做residual是自回归的regression? (转载)	● R有哪个比较好的做nonparametric regression的package？
● Model combination v.s Bayesian Model Averaging	● 问一个关于linear regression的error假设问题
● proc mixed multivariate regression的参数	● Linear regression model 问题请教
● 请教一个logistic regression的问题	● 请教LINEAR REGRESSION基本问题
● 怎么样才能提高R SQUARE的值--using SPSS?	● 请教， linear regression model问题
● 如何确定什么情况time series，什么情况linear reg？	● ordinary linear regression assume数据是Normal distribution么？
● ##面试过了，问题求解答##	● Correlation between 2 variables
● 已知两组数据,x,y 要找出function, f(x)=y	● polynomial 和linear

相关话题的讨论汇总
话题: income话题: linear话题: variable话题: use

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天