c**i 发帖数: 234 | 1 对于logistic regression
log(pi/1-pi)=b0+b1x1+b2x2
我现在已知independent variables和response variable{log(pi/1-pi)}
我要怎么做才能把参数b0,b1,b2 fit出来?非常感谢 |
s*******e 发帖数: 1385 | 2 SAS, R?
【在 c**i 的大作中提到】 : 对于logistic regression : log(pi/1-pi)=b0+b1x1+b2x2 : 我现在已知independent variables和response variable{log(pi/1-pi)} : 我要怎么做才能把参数b0,b1,b2 fit出来?非常感谢
|
c**i 发帖数: 234 | |
z*******n 发帖数: 15481 | 4 你是frequentist 还是 bayesian?
frequentist的话算MLE 然后inverse of fishier information matrix是estimate的
variance
bayesian的话用MCMC sampler去sample 那些参数的posterior distribution 然后
sample mean作为参数的估计值
如需更多细节 发几个包子吧 我给你发pdf文件 呵呵 |
z*******n 发帖数: 15481 | 5 R 有一个function可以直接算出结果的哈
glm(y~x1+x2+x3+...,family=binomial(link="logit"))
如需更多的设置 你可以用?glm在R里面查看glm function的细节 |
c**i 发帖数: 234 | 6 感谢楼上的,看来我没说清楚。我的问题是现在已经有log(pi/1-pi),并且是一个连
续分布的变量,看图形比较像gamma distribution的样子,我想问我有了log(pi/1-pi
),并且知道x1和x2,现在想要估计参数b0,b1,b2,该如何估计?是直接用linear
regression么?如果是的话glm()里面那个family是不是要设置成gamma?谢谢 |
z*******n 发帖数: 15481 | 7 如果你知道log(pi/1-pi),那就直接线性回归就行了啊
跟我给你的glm function就没关系
如果你看着像gamma 分布 线性回归肯定不靠谱了
尝试加一些 x1^2 和x2^2 然后做线性回归吧
pi
【在 c**i 的大作中提到】 : 感谢楼上的,看来我没说清楚。我的问题是现在已经有log(pi/1-pi),并且是一个连 : 续分布的变量,看图形比较像gamma distribution的样子,我想问我有了log(pi/1-pi : ),并且知道x1和x2,现在想要估计参数b0,b1,b2,该如何估计?是直接用linear : regression么?如果是的话glm()里面那个family是不是要设置成gamma?谢谢
|
A*******s 发帖数: 3942 | 8 你光看distribution of Y是没用的,我们只关心conditional dist of Y。
你这个用GLM就能搞定,如果你这个p是# of events/# of trials, 这个还是最基本的
logistic (Bernoulli/binomial) regression。如果这个p是rates/proportion的话也
可以用beta regression。
pi
【在 c**i 的大作中提到】 : 感谢楼上的,看来我没说清楚。我的问题是现在已经有log(pi/1-pi),并且是一个连 : 续分布的变量,看图形比较像gamma distribution的样子,我想问我有了log(pi/1-pi : ),并且知道x1和x2,现在想要估计参数b0,b1,b2,该如何估计?是直接用linear : regression么?如果是的话glm()里面那个family是不是要设置成gamma?谢谢
|
c**i 发帖数: 234 | 9 感谢楼上的各位,pi是probability of success。
正常的logistic regression 是log(pi/1-pi)=b0+b1x1+b2x2
所以pi=exp(b0+b1x1+b2x2)
但是这个model需要数各种组合下success或者fail的#,我没有这个#
我现在有其他人用上面的model fit 的 pi,并且我能确定x1,x2,我现在是想说看看他
用的model里的b0,b1,和b2是什么 |
B******5 发帖数: 4676 | 10 那就直接用普通的linear regression不就行了?
【在 c**i 的大作中提到】 : 感谢楼上的各位,pi是probability of success。 : 正常的logistic regression 是log(pi/1-pi)=b0+b1x1+b2x2 : 所以pi=exp(b0+b1x1+b2x2) : 但是这个model需要数各种组合下success或者fail的#,我没有这个# : 我现在有其他人用上面的model fit 的 pi,并且我能确定x1,x2,我现在是想说看看他 : 用的model里的b0,b1,和b2是什么
|
|
|
c**i 发帖数: 234 | 11 现在是这么做的,但是fit出来Rsquare只有70%不到,不知道是不是因为response
variable不是follow normal distribution的缘故,是不是应该先把response
variable做一下transformation? |
z*******n 发帖数: 15481 | 12 发几个大包子吧 然后告诉你接下来怎么做 呵呵
【在 c**i 的大作中提到】 : 现在是这么做的,但是fit出来Rsquare只有70%不到,不知道是不是因为response : variable不是follow normal distribution的缘故,是不是应该先把response : variable做一下transformation?
|
A*******s 发帖数: 3942 | 13 i bet u will see diamond shape residual plot
【在 c**i 的大作中提到】 : 现在是这么做的,但是fit出来Rsquare只有70%不到,不知道是不是因为response : variable不是follow normal distribution的缘故,是不是应该先把response : variable做一下transformation?
|
D******n 发帖数: 2836 | 14 0.7蛮不错的嘛。
【在 c**i 的大作中提到】 : 现在是这么做的,但是fit出来Rsquare只有70%不到,不知道是不是因为response : variable不是follow normal distribution的缘故,是不是应该先把response : variable做一下transformation?
|
c**i 发帖数: 234 | 15 daniu, response vs residual看起来是个勾的样子
. .
.. ....
... .......
.
所以是什么原因聂?是不是分布滴问题?
【在 A*******s 的大作中提到】 : i bet u will see diamond shape residual plot
|
z*******n 发帖数: 15481 | 16 这是quadratic form
发我十个包子 完整告诉你下面该做啥
【在 c**i 的大作中提到】 : daniu, response vs residual看起来是个勾的样子 : . . : .. .... : ... ....... : . : 所以是什么原因聂?是不是分布滴问题?
|
A*******s 发帖数: 3942 | 17 no, i think it's about unconstant variance/heteroscedasticity. Typical
pattern for bounded outcome. whatever transformation would not help.
0 < Y <1
=>
-Y_hat< e < 1-Y_hat
u will observe near-zero variance of residual close to boundary and maximum
at Y_hat=0.5 |
z*******n 发帖数: 15481 | 18 呵呵 heteroscedasticity这个词好BT
这个不只是non-constant variance的问题 他有明显的quadratic form 需要加入
quadratic term到X里面然后做线性回归 再看residual vs y 再check下QQ plot基本就
完事了 加入quadratic form之后还有问题 可以做log transformation要想fit好 总是
可以做到 但是model的interpretation就没办法了
maximum
【在 A*******s 的大作中提到】 : no, i think it's about unconstant variance/heteroscedasticity. Typical : pattern for bounded outcome. whatever transformation would not help. : 0 < Y <1 : => : -Y_hat< e < 1-Y_hat : u will observe near-zero variance of residual close to boundary and maximum : at Y_hat=0.5
|
A*******s 发帖数: 3942 | 19 google "bounded outcome regression". As far as i know, latent variable model
or beta regression can be applied.
i dont think it is about lack of curvature. I bet LZ would see the same
pattern even after he add quadratic terms.
it is ok to use OLS in the case if only the conditional mean of Y is of
interest and variance is nuisance. Whenever u would like to draw inference
from the distribution of Y (significance test, confidence interval), OLS
would fail since it gives wrong estimate of VAR[Y|X] .
【在 z*******n 的大作中提到】 : 呵呵 heteroscedasticity这个词好BT : 这个不只是non-constant variance的问题 他有明显的quadratic form 需要加入 : quadratic term到X里面然后做线性回归 再看residual vs y 再check下QQ plot基本就 : 完事了 加入quadratic form之后还有问题 可以做log transformation要想fit好 总是 : 可以做到 但是model的interpretation就没办法了 : : maximum
|
z*******n 发帖数: 15481 | 20 他的Y是log(p/1-p)显然不是bounded的啊 。。。。。。
OLS可以用 variance可以用三明治estimator去估计
model
【在 A*******s 的大作中提到】 : google "bounded outcome regression". As far as i know, latent variable model : or beta regression can be applied. : i dont think it is about lack of curvature. I bet LZ would see the same : pattern even after he add quadratic terms. : it is ok to use OLS in the case if only the conditional mean of Y is of : interest and variance is nuisance. Whenever u would like to draw inference : from the distribution of Y (significance test, confidence interval), OLS : would fail since it gives wrong estimate of VAR[Y|X] .
|
|
|
A*******s 发帖数: 3942 | 21 it is still bounded... say the min and max p is 0.01 and 0.99 respectively
in the data, after logit transformation it would be log(1/99) and log(99).
if the data are dense near the boundaries u will still see the angle like
distribution of residuals.
u can check out this example I worked on before:
http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
the outcome is standardized and scaled between 0 and 1.
no matter what type of transformation i used, residual plots would present
unconstant variance.
and yes u can use some method to correct the misspecification. but why not
just using alternatives with more reasonable assumption?
【在 z*******n 的大作中提到】 : 他的Y是log(p/1-p)显然不是bounded的啊 。。。。。。 : OLS可以用 variance可以用三明治estimator去估计 : : model
|
z*******n 发帖数: 15481 | 22 晕菜 那几乎所有问题都是bounded的了 你凭什么假设最小的p是0.01.。。 如果是0.
001 那下线会降低多少
有意思么。。。
还非得当bounded来做。。。。。你统计白学了啊
你那个是0 1 bounded当然不一样
这个就是普通的问题 加入quadratic term就可以很有效了 根本都不需要
transformation
【在 A*******s 的大作中提到】 : it is still bounded... say the min and max p is 0.01 and 0.99 respectively : in the data, after logit transformation it would be log(1/99) and log(99). : if the data are dense near the boundaries u will still see the angle like : distribution of residuals. : u can check out this example I worked on before: : http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime : the outcome is standardized and scaled between 0 and 1. : no matter what type of transformation i used, residual plots would present : unconstant variance. : and yes u can use some method to correct the misspecification. but why not
|
B******5 发帖数: 4676 | 23 学统计的枪手球迷?o(∩∩)o...哈哈
【在 z*******n 的大作中提到】 : 他的Y是log(p/1-p)显然不是bounded的啊 。。。。。。 : OLS可以用 variance可以用三明治estimator去估计 : : model
|
z*******n 发帖数: 15481 | 24 哈哈 学统计的是枪迷很奇怪吗?
【在 B******5 的大作中提到】 : 学统计的枪手球迷?o(∩∩)o...哈哈
|
A*******s 发帖数: 3942 | 25 hey, look at my post:
if data are DENSY near boundaries...
which suggest some truncated/censored nature behind what u see. that is the
origin of latent variable models for bounded outcome.
if u don't buy it, try my example after u delete all the 0 and 1 Y, and
print out the residual plot.
or u can just simply google beta regression. Here is some explanation about
the need of beta regression:;
How should one perform a regression analysis in which the dependent variable
(or response
variable), y, assumes values in the standard unit interval (0; 1)? The usual
practice used to be to transform the data so that the transformed response,
say ~y, assumes values in the real line and then apply a standard linear
regression analysis. A commonly used transformation is the logit, ~y = log(y
=(1y)). This approach, nonetheless, has shortcomings. First, the
regression parameters are interpretable in terms of the mean of ~y, and not
in terms of the mean of y (given Jensen's inequality). Second, regressions
involving data from the unit interval such as rates and proportions are
typically heteroskedastic: they display more variation around the mean and
less variation as we approach the lower and upper limits of the standard
unit interval. Finally, the distributions of rates and proportions are
typically asymmetric, and thus Gaussian-based approximations for interval
estimation and hypothesis testing can be quite inaccurate in small samples.
【在 z*******n 的大作中提到】 : 晕菜 那几乎所有问题都是bounded的了 你凭什么假设最小的p是0.01.。。 如果是0. : 001 那下线会降低多少 : 有意思么。。。 : 还非得当bounded来做。。。。。你统计白学了啊 : 你那个是0 1 bounded当然不一样 : 这个就是普通的问题 加入quadratic term就可以很有效了 根本都不需要 : transformation
|
B******5 发帖数: 4676 | 26 没有啊,找到同行了,呵呵
【在 z*******n 的大作中提到】 : 哈哈 学统计的是枪迷很奇怪吗?
|
A*******s 发帖数: 3942 | 27 BTW u need to read LZ's post again. The residual plot is "response vs
residual" rather than "residual vs response". Please turn ur head 90 degree
before u suggest adding quadratic term.
【在 z*******n 的大作中提到】 : 晕菜 那几乎所有问题都是bounded的了 你凭什么假设最小的p是0.01.。。 如果是0. : 001 那下线会降低多少 : 有意思么。。。 : 还非得当bounded来做。。。。。你统计白学了啊 : 你那个是0 1 bounded当然不一样 : 这个就是普通的问题 加入quadratic term就可以很有效了 根本都不需要 : transformation
|
d******e 发帖数: 7844 | 28 哈哈,赞最后一句
degree
【在 A*******s 的大作中提到】 : BTW u need to read LZ's post again. The residual plot is "response vs : residual" rather than "residual vs response". Please turn ur head 90 degree : before u suggest adding quadratic term.
|
z*******n 发帖数: 15481 | 29 跟你扯真是浪费时间
我知道他是response vs residual 那根本就是他的错误 应该看residual vs y
residual vs x1, residual vs x2 他根本不知道该怎么去做 问问题不给包子 我啥也
不说了 你就帮他回答吧
degree
【在 A*******s 的大作中提到】 : BTW u need to read LZ's post again. The residual plot is "response vs : residual" rather than "residual vs response". Please turn ur head 90 degree : before u suggest adding quadratic term.
|
z*******n 发帖数: 15481 | 30 加上一句 无包子 不回帖
苦逼的穷人啊。。。。
【在 z*******n 的大作中提到】 : 跟你扯真是浪费时间 : 我知道他是response vs residual 那根本就是他的错误 应该看residual vs y : residual vs x1, residual vs x2 他根本不知道该怎么去做 问问题不给包子 我啥也 : 不说了 你就帮他回答吧 : : degree
|
|
|
c**i 发帖数: 234 | 31 都是大牛们啊,lz就是很菜鸟才来问滴,
ps,包子咋发啊? |
B******5 发帖数: 4676 | 32 似乎楼主也没比你富多少。。。
【在 z*******n 的大作中提到】 : 加上一句 无包子 不回帖 : 苦逼的穷人啊。。。。
|
z*******n 发帖数: 15481 | 33 发包子就是在你的金融中心里面转伪币 10个伪币就是一个包子 谢谢
【在 c**i 的大作中提到】 : 都是大牛们啊,lz就是很菜鸟才来问滴, : ps,包子咋发啊?
|
j*****e 发帖数: 182 | 34 Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2.
Parameter b0,b1,b2 are estimated by likelihood maximization, which is
essentially to solve a weighted least square equaltion. Here, the weight
dependends on b0,b1,b2 and also the binomial size n. Because the weight
depends on b0-b2, b0-b2 have to be solved iteratively (no close form
expression). This is known as the iterative reweighted least square method.
Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square
can no longer be used to measure the goodness of the model, since this is a
generalized linear model. |
c**i 发帖数: 234 | 35 明眼人啊!lz是穷人!!!
【在 B******5 的大作中提到】 : 似乎楼主也没比你富多少。。。
|
c**i 发帖数: 234 | 36 是的,您说的对,data确实是在一边比较集中
【在 A*******s 的大作中提到】 : it is still bounded... say the min and max p is 0.01 and 0.99 respectively : in the data, after logit transformation it would be log(1/99) and log(99). : if the data are dense near the boundaries u will still see the angle like : distribution of residuals. : u can check out this example I worked on before: : http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime : the outcome is standardized and scaled between 0 and 1. : no matter what type of transformation i used, residual plots would present : unconstant variance. : and yes u can use some method to correct the misspecification. but why not
|
c**i 发帖数: 234 | 37 感谢回复!
. Here, the weight
.
square
a
【在 j*****e 的大作中提到】 : Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2. : Parameter b0,b1,b2 are estimated by likelihood maximization, which is : essentially to solve a weighted least square equaltion. Here, the weight : dependends on b0,b1,b2 and also the binomial size n. Because the weight : depends on b0-b2, b0-b2 have to be solved iteratively (no close form : expression). This is known as the iterative reweighted least square method. : Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square : can no longer be used to measure the goodness of the model, since this is a : generalized linear model.
|
A*******s 发帖数: 3942 | 38 now u know why there is an ice bag on my neck...
【在 d******e 的大作中提到】 : 哈哈,赞最后一句 : : degree
|
A*******s 发帖数: 3942 | 39 yes that's why ur residual plot has only a half diamond...ur data should
have very few points near 1 but a lot near 0.
【在 c**i 的大作中提到】 : 是的,您说的对,data确实是在一边比较集中
|
c**i 发帖数: 234 | 40 恩,就是这个情况。
【在 A*******s 的大作中提到】 : yes that's why ur residual plot has only a half diamond...ur data should : have very few points near 1 but a lot near 0.
|
|
|
p***r 发帖数: 920 | 41 You can't fit to find the result, except you know how many data points you
obtained at each logit ratio.
★ 发自iPhone App: ChineseWeb - 中文网站浏览器
【在 c**i 的大作中提到】 : 对于logistic regression : log(pi/1-pi)=b0+b1x1+b2x2 : 我现在已知independent variables和response variable{log(pi/1-pi)} : 我要怎么做才能把参数b0,b1,b2 fit出来?非常感谢
|