请教一个关于logistic regression参数的问题 - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 请教一个关于logistic regression参数的问题

相关主题
● Order of Independent Variables in Linear Multiple Regression	● 求助，怎么消除线性回归的multicollinearity
● how to interpret the interaction terms between two continuous variables in regression model?	● 很惭愧的问一个简单的regression algebra.
● 请问一个Regression的问题	● 很desperate，求问生物统计牛人一些interview技术问题。。。。
● 如何建立多重变量回归模型？	● 该用什么model?
● logistic regression 用什么flat prior	● regression后的residual是个双峰分布
● Linear regression model 问题请教	● 问个SAS regression问题
● SAS question - baozi	● 这种方法在regression有效吗
● 在IT公司找statistician职位学什么编程语言比较好？	● linear regression的时候

相关话题的讨论汇总
话题: regression话题: residual话题: b0话题: pi话题: bounded

进入Statistics版参与讨论

(共1页)

c**i
发帖数: 234

对于logistic regression
log（pi/1-pi）=b0+b1x1+b2x2
我现在已知independent variables和response variable{log(pi/1-pi)}
我要怎么做才能把参数b0,b1,b2 fit出来？非常感谢

s*******e
发帖数: 1385

SAS, R?

【在 c**i 的大作中提到】

: 对于logistic regression
: log（pi/1-pi）=b0+b1x1+b2x2
: 我现在已知independent variables和response variable{log(pi/1-pi)}
: 我要怎么做才能把参数b0,b1,b2 fit出来？非常感谢

c**i
发帖数: 234

R,thanks

z*******n
发帖数: 15481

你是frequentist 还是 bayesian？
frequentist的话算MLE 然后inverse of fishier information matrix是estimate的
variance
bayesian的话用MCMC sampler去sample 那些参数的posterior distribution 然后
sample mean作为参数的估计值
如需更多细节发几个包子吧我给你发pdf文件呵呵

z*******n
发帖数: 15481

R 有一个function可以直接算出结果的哈
glm(y~x1+x2+x3+...,family=binomial(link="logit"))
如需更多的设置你可以用？glm在R里面查看glm function的细节

c**i
发帖数: 234

感谢楼上的，看来我没说清楚。我的问题是现在已经有log（pi/1-pi），并且是一个连
续分布的变量，看图形比较像gamma distribution的样子，我想问我有了log（pi/1-pi
），并且知道x1和x2，现在想要估计参数b0,b1,b2,该如何估计？是直接用linear
regression么？如果是的话glm（）里面那个family是不是要设置成gamma？谢谢

z*******n
发帖数: 15481

如果你知道log（pi/1-pi），那就直接线性回归就行了啊
跟我给你的glm function就没关系
如果你看着像gamma 分布线性回归肯定不靠谱了
尝试加一些 x1^2 和x2^2 然后做线性回归吧

pi

【在 c**i 的大作中提到】

: 感谢楼上的，看来我没说清楚。我的问题是现在已经有log（pi/1-pi），并且是一个连
: 续分布的变量，看图形比较像gamma distribution的样子，我想问我有了log（pi/1-pi
: ），并且知道x1和x2，现在想要估计参数b0,b1,b2,该如何估计？是直接用linear
: regression么？如果是的话glm（）里面那个family是不是要设置成gamma？谢谢

A*******s
发帖数: 3942

你光看distribution of Y是没用的，我们只关心conditional dist of Y。
你这个用GLM就能搞定，如果你这个p是# of events/# of trials, 这个还是最基本的
logistic (Bernoulli/binomial） regression。如果这个p是rates/proportion的话也
可以用beta regression。

pi

【在 c**i 的大作中提到】

c**i
发帖数: 234

感谢楼上的各位，pi是probability of success。
正常的logistic regression 是log（pi/1-pi）=b0+b1x1+b2x2
所以pi=exp（b0+b1x1+b2x2)
但是这个model需要数各种组合下success或者fail的#，我没有这个#
我现在有其他人用上面的model fit 的 pi，并且我能确定x1,x2,我现在是想说看看他
用的model里的b0，b1，和b2是什么

B******5
发帖数: 4676

那就直接用普通的linear regression不就行了？

【在 c**i 的大作中提到】

: 感谢楼上的各位，pi是probability of success。
: 正常的logistic regression 是log（pi/1-pi）=b0+b1x1+b2x2
: 所以pi=exp（b0+b1x1+b2x2)
: 但是这个model需要数各种组合下success或者fail的#，我没有这个#
: 我现在有其他人用上面的model fit 的 pi，并且我能确定x1,x2,我现在是想说看看他
: 用的model里的b0，b1，和b2是什么

相关主题
● Linear regression model 问题请教	● 求助，怎么消除线性回归的multicollinearity
● SAS question - baozi	● 很惭愧的问一个简单的regression algebra.
● 在IT公司找statistician职位学什么编程语言比较好？	● 很desperate，求问生物统计牛人一些interview技术问题。。。。
进入Statistics版参与讨论

c**i
发帖数: 234

现在是这么做的，但是fit出来Rsquare只有70%不到，不知道是不是因为response
variable不是follow normal distribution的缘故，是不是应该先把response
variable做一下transformation？

z*******n
发帖数: 15481

发几个大包子吧然后告诉你接下来怎么做呵呵

【在 c**i 的大作中提到】

: 现在是这么做的，但是fit出来Rsquare只有70%不到，不知道是不是因为response
: variable不是follow normal distribution的缘故，是不是应该先把response
: variable做一下transformation？

A*******s
发帖数: 3942

i bet u will see diamond shape residual plot

【在 c**i 的大作中提到】

D******n
发帖数: 2836

0.7蛮不错的嘛。

【在 c**i 的大作中提到】

c**i
发帖数: 234

daniu, response vs residual看起来是个勾的样子
. .
.. ....
... .......
.
所以是什么原因聂?是不是分布滴问题?

【在 A*******s 的大作中提到】

: i bet u will see diamond shape residual plot

z*******n
发帖数: 15481

这是quadratic form
发我十个包子完整告诉你下面该做啥

【在 c**i 的大作中提到】

: daniu, response vs residual看起来是个勾的样子
: . .
: .. ....
: ... .......
: .
: 所以是什么原因聂?是不是分布滴问题?

A*******s
发帖数: 3942

no, i think it's about unconstant variance/heteroscedasticity. Typical
pattern for bounded outcome. whatever transformation would not help.
0 < Y <1
=>
-Y_hat< e < 1-Y_hat
u will observe near-zero variance of residual close to boundary and maximum
at Y_hat=0.5

z*******n
发帖数: 15481

呵呵 heteroscedasticity这个词好BT
这个不只是non-constant variance的问题他有明显的quadratic form 需要加入
quadratic term到X里面然后做线性回归再看residual vs y 再check下QQ plot基本就
完事了加入quadratic form之后还有问题可以做log transformation要想fit好总是
可以做到但是model的interpretation就没办法了

maximum

【在 A*******s 的大作中提到】

: no, i think it's about unconstant variance/heteroscedasticity. Typical
: pattern for bounded outcome. whatever transformation would not help.
: 0 < Y <1
: =>
: -Y_hat< e < 1-Y_hat
: u will observe near-zero variance of residual close to boundary and maximum
: at Y_hat=0.5

A*******s
发帖数: 3942

google "bounded outcome regression". As far as i know, latent variable model
or beta regression can be applied.
i dont think it is about lack of curvature. I bet LZ would see the same
pattern even after he add quadratic terms.
it is ok to use OLS in the case if only the conditional mean of Y is of
interest and variance is nuisance. Whenever u would like to draw inference
from the distribution of Y (significance test, confidence interval), OLS
would fail since it gives wrong estimate of VAR[Y|X] .

【在 z*******n 的大作中提到】

: 呵呵 heteroscedasticity这个词好BT
: 这个不只是non-constant variance的问题他有明显的quadratic form 需要加入
: quadratic term到X里面然后做线性回归再看residual vs y 再check下QQ plot基本就
: 完事了加入quadratic form之后还有问题可以做log transformation要想fit好总是
: 可以做到但是model的interpretation就没办法了
:
: maximum

z*******n
发帖数: 15481

他的Y是log（p/1-p）显然不是bounded的啊。。。。。。
OLS可以用 variance可以用三明治estimator去估计

model

【在 A*******s 的大作中提到】

: google "bounded outcome regression". As far as i know, latent variable model
: or beta regression can be applied.
: i dont think it is about lack of curvature. I bet LZ would see the same
: pattern even after he add quadratic terms.
: it is ok to use OLS in the case if only the conditional mean of Y is of
: interest and variance is nuisance. Whenever u would like to draw inference
: from the distribution of Y (significance test, confidence interval), OLS
: would fail since it gives wrong estimate of VAR[Y|X] .

相关主题
● 该用什么model?	● 这种方法在regression有效吗
● regression后的residual是个双峰分布	● linear regression的时候
● 问个SAS regression问题	● 关于 Logit Regression和Deviance的问题。
进入Statistics版参与讨论

A*******s
发帖数: 3942

it is still bounded... say the min and max p is 0.01 and 0.99 respectively
in the data, after logit transformation it would be log(1/99) and log(99).
if the data are dense near the boundaries u will still see the angle like
distribution of residuals.
u can check out this example I worked on before:
http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
the outcome is standardized and scaled between 0 and 1.
no matter what type of transformation i used, residual plots would present
unconstant variance.
and yes u can use some method to correct the misspecification. but why not
just using alternatives with more reasonable assumption?

【在 z*******n 的大作中提到】

: 他的Y是log（p/1-p）显然不是bounded的啊。。。。。。
: OLS可以用 variance可以用三明治estimator去估计
:
: model

z*******n
发帖数: 15481

晕菜那几乎所有问题都是bounded的了你凭什么假设最小的p是0.01.。。如果是0.
001 那下线会降低多少
有意思么。。。
还非得当bounded来做。。。。。你统计白学了啊
你那个是0 1 bounded当然不一样
这个就是普通的问题加入quadratic term就可以很有效了根本都不需要
transformation

【在 A*******s 的大作中提到】

: it is still bounded... say the min and max p is 0.01 and 0.99 respectively
: in the data, after logit transformation it would be log(1/99) and log(99).
: if the data are dense near the boundaries u will still see the angle like
: distribution of residuals.
: u can check out this example I worked on before:
: http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
: the outcome is standardized and scaled between 0 and 1.
: no matter what type of transformation i used, residual plots would present
: unconstant variance.
: and yes u can use some method to correct the misspecification. but why not

B******5
发帖数: 4676

学统计的枪手球迷？o(∩∩)o...哈哈

【在 z*******n 的大作中提到】

: 他的Y是log（p/1-p）显然不是bounded的啊。。。。。。
: OLS可以用 variance可以用三明治estimator去估计
:
: model

z*******n
发帖数: 15481

哈哈学统计的是枪迷很奇怪吗？

【在 B******5 的大作中提到】

: 学统计的枪手球迷？o(∩∩)o...哈哈

A*******s
发帖数: 3942

hey, look at my post:
if data are DENSY near boundaries...
which suggest some truncated/censored nature behind what u see. that is the
origin of latent variable models for bounded outcome.
if u don't buy it, try my example after u delete all the 0 and 1 Y, and
print out the residual plot.
or u can just simply google beta regression. Here is some explanation about
the need of beta regression:;
How should one perform a regression analysis in which the dependent variable
(or response
variable), y, assumes values in the standard unit interval (0; 1)? The usual
practice used to be to transform the data so that the transformed response,
say ~y, assumes values in the real line and then apply a standard linear
regression analysis. A commonly used transformation is the logit, ~y = log(y
=(1􀀀y)). This approach, nonetheless, has shortcomings. First, the
regression parameters are interpretable in terms of the mean of ~y, and not
in terms of the mean of y (given Jensen's inequality). Second, regressions
involving data from the unit interval such as rates and proportions are
typically heteroskedastic: they display more variation around the mean and
less variation as we approach the lower and upper limits of the standard
unit interval. Finally, the distributions of rates and proportions are
typically asymmetric, and thus Gaussian-based approximations for interval
estimation and hypothesis testing can be quite inaccurate in small samples.

【在 z*******n 的大作中提到】

: 晕菜那几乎所有问题都是bounded的了你凭什么假设最小的p是0.01.。。如果是0.
: 001 那下线会降低多少
: 有意思么。。。
: 还非得当bounded来做。。。。。你统计白学了啊
: 你那个是0 1 bounded当然不一样
: 这个就是普通的问题加入quadratic term就可以很有效了根本都不需要
: transformation

B******5
发帖数: 4676

没有啊，找到同行了，呵呵

【在 z*******n 的大作中提到】

: 哈哈学统计的是枪迷很奇怪吗？

A*******s
发帖数: 3942

BTW u need to read LZ's post again. The residual plot is "response vs
residual" rather than "residual vs response". Please turn ur head 90 degree
before u suggest adding quadratic term.

【在 z*******n 的大作中提到】

d******e
发帖数: 7844

哈哈，赞最后一句

degree

【在 A*******s 的大作中提到】

: BTW u need to read LZ's post again. The residual plot is "response vs
: residual" rather than "residual vs response". Please turn ur head 90 degree
: before u suggest adding quadratic term.

z*******n
发帖数: 15481

跟你扯真是浪费时间
我知道他是response vs residual 那根本就是他的错误应该看residual vs y
residual vs x1， residual vs x2 他根本不知道该怎么去做问问题不给包子我啥也
不说了你就帮他回答吧

degree

【在 A*******s 的大作中提到】

: BTW u need to read LZ's post again. The residual plot is "response vs
: residual" rather than "residual vs response". Please turn ur head 90 degree
: before u suggest adding quadratic term.

z*******n
发帖数: 15481

加上一句无包子不回帖
苦逼的穷人啊。。。。

【在 z*******n 的大作中提到】

: 跟你扯真是浪费时间
: 我知道他是response vs residual 那根本就是他的错误应该看residual vs y
: residual vs x1， residual vs x2 他根本不知道该怎么去做问问题不给包子我啥也
: 不说了你就帮他回答吧
:
: degree

相关主题
● 求教sas里logit regression要marginal effects, 咋整？	● how to interpret the interaction terms between two continuous variables in regression model?
● 请教logistic regression	● 请问一个Regression的问题
● Order of Independent Variables in Linear Multiple Regression	● 如何建立多重变量回归模型？
进入Statistics版参与讨论

c**i
发帖数: 234

都是大牛们啊，lz就是很菜鸟才来问滴，
ps，包子咋发啊？

B******5
发帖数: 4676

似乎楼主也没比你富多少。。。

【在 z*******n 的大作中提到】

: 加上一句无包子不回帖
: 苦逼的穷人啊。。。。

z*******n
发帖数: 15481

发包子就是在你的金融中心里面转伪币 10个伪币就是一个包子谢谢

【在 c**i 的大作中提到】

: 都是大牛们啊，lz就是很菜鸟才来问滴，
: ps，包子咋发啊？

j*****e
发帖数: 182

Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2.
Parameter b0,b1,b2 are estimated by likelihood maximization, which is
essentially to solve a weighted least square equaltion. Here, the weight
dependends on b0,b1,b2 and also the binomial size n. Because the weight
depends on b0-b2, b0-b2 have to be solved iteratively (no close form
expression). This is known as the iterative reweighted least square method.
Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square
can no longer be used to measure the goodness of the model, since this is a
generalized linear model.

c**i
发帖数: 234

明眼人啊！lz是穷人！！！

【在 B******5 的大作中提到】

: 似乎楼主也没比你富多少。。。

c**i
发帖数: 234

是的，您说的对，data确实是在一边比较集中

【在 A*******s 的大作中提到】

c**i
发帖数: 234

感谢回复！

. Here, the weight
.
square
a

【在 j*****e 的大作中提到】

: Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2.
: Parameter b0,b1,b2 are estimated by likelihood maximization, which is
: essentially to solve a weighted least square equaltion. Here, the weight
: dependends on b0,b1,b2 and also the binomial size n. Because the weight
: depends on b0-b2, b0-b2 have to be solved iteratively (no close form
: expression). This is known as the iterative reweighted least square method.
: Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square
: can no longer be used to measure the goodness of the model, since this is a
: generalized linear model.

A*******s
发帖数: 3942

now u know why there is an ice bag on my neck...

【在 d******e 的大作中提到】

: 哈哈，赞最后一句
:
: degree

A*******s
发帖数: 3942

yes that's why ur residual plot has only a half diamond...ur data should
have very few points near 1 but a lot near 0.

【在 c**i 的大作中提到】

: 是的，您说的对，data确实是在一边比较集中

c**i
发帖数: 234

恩，就是这个情况。

【在 A*******s 的大作中提到】

: yes that's why ur residual plot has only a half diamond...ur data should
: have very few points near 1 but a lot near 0.

相关主题
● 如何建立多重变量回归模型？	● SAS question - baozi
● logistic regression 用什么flat prior	● 在IT公司找statistician职位学什么编程语言比较好？
● Linear regression model 问题请教	● 求助，怎么消除线性回归的multicollinearity
进入Statistics版参与讨论

p***r
发帖数: 920

You can't fit to find the result, except you know how many data points you
obtained at each logit ratio.

★ 发自iPhone App: ChineseWeb - 中文网站浏览器

【在 c**i 的大作中提到】

: 对于logistic regression
: log（pi/1-pi）=b0+b1x1+b2x2
: 我现在已知independent variables和response variable{log(pi/1-pi)}
: 我要怎么做才能把参数b0,b1,b2 fit出来？非常感谢

(共1页)

进入Statistics版参与讨论

相关主题
● linear regression的时候	● logistic regression 用什么flat prior
● 关于 Logit Regression和Deviance的问题。	● Linear regression model 问题请教
● 求教sas里logit regression要marginal effects, 咋整？	● SAS question - baozi
● 请教logistic regression	● 在IT公司找statistician职位学什么编程语言比较好？
● Order of Independent Variables in Linear Multiple Regression	● 求助，怎么消除线性回归的multicollinearity
● how to interpret the interaction terms between two continuous variables in regression model?	● 很惭愧的问一个简单的regression algebra.
● 请问一个Regression的问题	● 很desperate，求问生物统计牛人一些interview技术问题。。。。
● 如何建立多重变量回归模型？	● 该用什么model?

相关话题的讨论汇总
话题: regression话题: residual话题: b0话题: pi话题: bounded

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天