[合集] 请教一个关于logistic regression参数的问题 - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - [合集] 请教一个关于logistic regression参数的问题

相关主题
● 请问个odds ratio 和 CI的问题	● a question for odds ratio estimate for logistic regression.
● 请教logistic regression	● R 里面 logistic regression （glm）怎么样输出 Odds Ratio 的 confidence interval
● logistic regression in case-control study	● Odds Ratio问题
● Wilcoxon rank sum test与logistic regression结果不同？	● SAS reg results are different for language SAS... :(
● 这段R logistic regression code有没有问题？	● 紧急求助，问到MULTIPLE REGRESSION的题
● 请教：怎么能把Logistic regression的OR转化成probablity	● [新手求救]怎样输出logistic regression的结果？
● logistic regression 用什么flat prior	● 保险公司technical interview 会怎么问？
● 请教大家这个数据应该用什么统计模型	● 谁给说说marketing analysis主要做什么

相关话题的讨论汇总
话题: logistic话题: regression话题: 参数

进入Statistics版参与讨论

(共1页)

p********a
发帖数: 5352

☆─────────────────────────────────────☆
cici (full house) 于 (Mon Nov 7 08:33:47 2011, 美东) 提到:
对于logistic regression
log（pi/1-pi）=b0+b1x1+b2x2
我现在已知independent variables和response variable{log(pi/1-pi)}
我要怎么做才能把参数b0,b1,b2 fit出来？非常感谢
☆─────────────────────────────────────☆
sleephare (I+don't+know.) 于 (Mon Nov 7 14:16:38 2011, 美东) 提到:
SAS, R?

☆─────────────────────────────────────☆
cici (full house) 于 (Mon Nov 7 16:19:05 2011, 美东) 提到:
R,thanks
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Mon Nov 7 16:56:52 2011, 美东) 提到:
你是frequentist 还是 bayesian？
frequentist的话算MLE 然后inverse of fishier information matrix是estimate的
variance
bayesian的话用MCMC sampler去sample 那些参数的posterior distribution 然后
sample mean作为参数的估计值
如需更多细节发几个包子吧我给你发pdf文件呵呵
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Mon Nov 7 17:00:42 2011, 美东) 提到:
R 有一个function可以直接算出结果的哈
glm(y~x1+x2+x3+...,family=binomial(link="logit"))
如需更多的设置你可以用？glm在R里面查看glm function的细节
☆─────────────────────────────────────☆
cici (full house) 于 (Mon Nov 7 23:09:46 2011, 美东) 提到:
感谢楼上的，看来我没说清楚。我的问题是现在已经有log（pi/1-pi），并且是一个连
续分布的变量，看图形比较像gamma distribution的样子，我想问我有了log（pi/1-pi
），并且知道x1和x2，现在想要估计参数b0,b1,b2,该如何估计？是直接用linear
regression么？如果是的话glm（）里面那个family是不是要设置成gamma？谢谢
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Mon Nov 7 23:25:31 2011, 美东) 提到:
如果你知道log（pi/1-pi），那就直接线性回归就行了啊
跟我给你的glm function就没关系
如果你看着像gamma 分布线性回归肯定不靠谱了
尝试加一些 x1^2 和x2^2 然后做线性回归吧
pi
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Mon Nov 7 23:55:04 2011, 美东) 提到:
你光看distribution of Y是没用的，我们只关心conditional dist of Y。
你这个用GLM就能搞定，如果你这个p是# of events/# of trials, 这个还是最基本的
logistic (Bernoulli/binomial） regression。如果这个p是rates/proportion的话也
可以用beta regression。
pi
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 00:18:23 2011, 美东) 提到:
感谢楼上的各位，pi是probability of success。
正常的logistic regression 是log（pi/1-pi）=b0+b1x1+b2x2
所以pi=exp（b0+b1x1+b2x2)
但是这个model需要数各种组合下success或者fail的#，我没有这个#
我现在有其他人用上面的model fit 的 pi，并且我能确定x1,x2,我现在是想说看看他
用的model里的b0，b1，和b2是什么
☆─────────────────────────────────────☆
Ballack5 (T-MAC) 于 (Tue Nov 8 01:16:21 2011, 美东) 提到:
那就直接用普通的linear regression不就行了？
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 10:35:34 2011, 美东) 提到:
现在是这么做的，但是fit出来Rsquare只有70%不到，不知道是不是因为response
variable不是follow normal distribution的缘故，是不是应该先把response
variable做一下transformation？
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 11:06:46 2011, 美东) 提到:
发几个大包子吧然后告诉你接下来怎么做呵呵
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 11:21:14 2011, 美东) 提到:
i bet u will see diamond shape residual plot
☆─────────────────────────────────────☆
DaShagen (Unbearable lightness) 于 (Tue Nov 8 11:31:23 2011, 美东) 提到:
0.7蛮不错的嘛。
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 13:22:07 2011, 美东) 提到:
daniu, response vs residual看起来是个勾的样子
. .
.. ....
... .......
.
所以是什么原因聂?是不是分布滴问题?
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 13:42:39 2011, 美东) 提到:
这是quadratic form
发我十个包子完整告诉你下面该做啥
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 15:48:13 2011, 美东) 提到:
no, i think it's about unconstant variance/heteroscedasticity. Typical
pattern for bounded outcome. whatever transformation would not help.
0 < Y <1
=>
-Y_hat< e < 1-Y_hat
u will observe near-zero variance of residual close to boundary and maximum
at Y_hat=0.5
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 16:03:53 2011, 美东) 提到:
呵呵 heteroscedasticity这个词好BT
这个不只是non-constant variance的问题他有明显的quadratic form 需要加入
quadratic term到X里面然后做线性回归再看residual vs y 再check下QQ plot基本就
完事了加入quadratic form之后还有问题可以做log transformation要想fit好总是
可以做到但是model的interpretation就没办法了
maximum
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 16:24:31 2011, 美东) 提到:
google "bounded outcome regression". As far as i know, latent variable model
or beta regression can be applied.
i dont think it is about lack of curvature. I bet LZ would see the same
pattern even after he add quadratic terms.
it is ok to use OLS in the case if only the conditional mean of Y is of
interest and variance is nuisance. Whenever u would like to draw inference
from the distribution of Y (significance test, confidence interval), OLS
would fail since it gives wrong estimate of VAR[Y|X] .
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 16:52:36 2011, 美东) 提到:
他的Y是log（p/1-p）显然不是bounded的啊。。。。。。
OLS可以用 variance可以用三明治estimator去估计
model
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 17:14:24 2011, 美东) 提到:
it is still bounded... say the min and max p is 0.01 and 0.99 respectively
in the data, after logit transformation it would be log(1/99) and log(99).
if the data are dense near the boundaries u will still see the angle like
distribution of residuals.
u can check out this example I worked on before:
http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
the outcome is standardized and scaled between 0 and 1.
no matter what type of transformation i used, residual plots would present
unconstant variance.
and yes u can use some method to correct the misspecification. but why not
just using alternatives with more reasonable assumption?
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 17:56:49 2011, 美东) 提到:
晕菜那几乎所有问题都是bounded的了你凭什么假设最小的p是0.01.。。如果是0.
001 那下线会降低多少
有意思么。。。
还非得当bounded来做。。。。。你统计白学了啊
你那个是0 1 bounded当然不一样
这个就是普通的问题加入quadratic term就可以很有效了根本都不需要
transformation
☆─────────────────────────────────────☆
Ballack5 (T-MAC) 于 (Tue Nov 8 17:59:26 2011, 美东) 提到:
学统计的枪手球迷？o(∩∩)o...哈哈
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 18:01:43 2011, 美东) 提到:
哈哈学统计的是枪迷很奇怪吗？
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 18:10:28 2011, 美东) 提到:
hey, look at my post:
if data are DENSY near boundaries...
which suggest some truncated/censored nature behind what u see. that is the
origin of latent variable models for bounded outcome.
if u don't buy it, try my example after u delete all the 0 and 1 Y, and
print out the residual plot.
or u can just simply google beta regression. Here is some explanation about
the need of beta regression:;
How should one perform a regression analysis in which the dependent variable
(or response
variable), y, assumes values in the standard unit interval (0; 1)? The usual
practice used to be to transform the data so that the transformed response,
say ~y, assumes values in the real line and then apply a standard linear
regression analysis. A commonly used transformation is the logit, ~y = log(y
=(1􀀀y)). This approach, nonetheless, has shortcomings. First, the
regression parameters are interpretable in terms of the mean of ~y, and not
in terms of the mean of y (given Jensen's inequality). Second, regressions
involving data from the unit interval such as rates and proportions are
typically heteroskedastic: they display more variation around the mean and
less variation as we approach the lower and upper limits of the standard
unit interval. Finally, the distributions of rates and proportions are
typically asymmetric, and thus Gaussian-based approximations for interval
estimation and hypothesis testing can be quite inaccurate in small samples.
☆─────────────────────────────────────☆
Ballack5 (T-MAC) 于 (Tue Nov 8 18:14:22 2011, 美东) 提到:
没有啊，找到同行了，呵呵
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 18:15:29 2011, 美东) 提到:
BTW u need to read LZ's post again. The residual plot is "response vs
residual" rather than "residual vs response". Please turn ur head 90 degree
before u suggest adding quadratic term.
☆─────────────────────────────────────☆
drburnie (专门爆料) 于 (Tue Nov 8 18:23:35 2011, 美东) 提到:
哈哈，赞最后一句
degree
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 19:07:03 2011, 美东) 提到:
跟你扯真是浪费时间
我知道他是response vs residual 那根本就是他的错误应该看residual vs y
residual vs x1， residual vs x2 他根本不知道该怎么去做问问题不给包子我啥也
不说了你就帮他回答吧
degree
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 19:09:09 2011, 美东) 提到:
加上一句无包子不回帖
苦逼的穷人啊。。。。
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 20:50:55 2011, 美东) 提到:
都是大牛们啊，lz就是很菜鸟才来问滴，
ps，包子咋发啊？
☆─────────────────────────────────────☆
Ballack5 (T-MAC) 于 (Tue Nov 8 20:53:10 2011, 美东) 提到:
似乎楼主也没比你富多少。。。
☆─────────────────────────────────────☆
zblgaoxin (又是半年) 于 (Tue Nov 8 20:53:23 2011, 美东) 提到:
发包子就是在你的金融中心里面转伪币 10个伪币就是一个包子谢谢
☆─────────────────────────────────────☆
jsdagre (na) 于 (Tue Nov 8 21:14:11 2011, 美东) 提到:
Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2.
Parameter b0,b1,b2 are estimated by likelihood maximization, which is
essentially to solve a weighted least square equaltion. Here, the weight
dependends on b0,b1,b2 and also the binomial size n. Because the weight
depends on b0-b2, b0-b2 have to be solved iteratively (no close form
expression). This is known as the iterative reweighted least square method.
Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square
can no longer be used to measure the goodness of the model, since this is a
generalized linear model.
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 21:46:37 2011, 美东) 提到:
明眼人啊！lz是穷人！！！
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 21:56:37 2011, 美东) 提到:
是的，您说的对，data确实是在一边比较集中
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 22:03:20 2011, 美东) 提到:
感谢回复！
. Here, the weight
.
square
a
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 22:33:42 2011, 美东) 提到:
now u know why there is an ice bag on my neck...
☆─────────────────────────────────────☆
Actuaries (striving) 于 (Tue Nov 8 22:37:23 2011, 美东) 提到:
yes that's why ur residual plot has only a half diamond...ur data should
have very few points near 1 but a lot near 0.
☆─────────────────────────────────────☆
cici (full house) 于 (Tue Nov 8 23:45:57 2011, 美东) 提到:
恩，就是这个情况。
☆─────────────────────────────────────☆
prior (prior) 于 (Sat Nov 12 23:15:10 2011, 美东) 提到:
You can't fit to find the result, except you know how many data points you
obtained at each logit ratio.
★ 发自iPhone App: ChineseWeb - 中文网站浏览器

(共1页)

进入Statistics版参与讨论

相关主题
● 谁给说说marketing analysis主要做什么	● 这段R logistic regression code有没有问题？
● a question	● 请教：怎么能把Logistic regression的OR转化成probablity
● 关于 Logit Regression和Deviance的问题。	● logistic regression 用什么flat prior
● [合集] 啥叫Probit model?	● 请教大家这个数据应该用什么统计模型
● 请问个odds ratio 和 CI的问题	● a question for odds ratio estimate for logistic regression.
● 请教logistic regression	● R 里面 logistic regression （glm）怎么样输出 Odds Ratio 的 confidence interval
● logistic regression in case-control study	● Odds Ratio问题
● Wilcoxon rank sum test与logistic regression结果不同？	● SAS reg results are different for language SAS... :(

相关话题的讨论汇总
话题: logistic话题: regression话题: 参数

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天