a****m 发帖数: 693 | 1 【 以下文字转载自 Biology 讨论区 】
发信人: affymm (no), 信区: Biology
标 题: 怎样用R 来提取 micrarray 中的regressor variable 的值?
发信站: BBS 未名空间站 (Mon Nov 2 11:14:36 2009, 美东)
我们用Lmfit 来做回归,原理上是所有的点都有Signal = u + Line + Treat + e, 的
linear model, 我们只知道signal的值,其他的variable的值只能通过这个linear Model得
到, 问题是怎样在R中把细胞系(line)和处理效应(treat)的值取从Linear model得出来,只
知道 怎样取 residue, like fit=lmFit(mymean,design),
Res=residuals(fit,mymean),不知道这个
LmFit对mix model合适不合适? Signal=u+Line+Treat+Line*Treat+e (mixed model)?
谢谢 |
|
|
t****g 发帖数: 715 | 3 Do whatever is needed (add/drop regressors, drop obs., scale/transform
regressors, etc) to get "good" regression outcome: as long as referees can
be convinced. Is not this what many empirical people are doing?
not
c |
|
f*********y 发帖数: 376 | 4 用train data run一个logit regression,假设发现psedudo R2很小,LR test 也说明
regressors是jointly significance的,z stat 也说明每个regressor很重要。但是,
有没有可能我用它去predict实际的数据,发现这个model的sensitivity很差,估计也
就10%,虽然sepcifity很高,但是这个model里的高的sensitivty很重要。
若这样的是可能的,我们能说找个模型很好吗。 |
|
L****8 发帖数: 3938 | 5 【 以下文字转载自 Programming 讨论区 】
发信人: Liber8 (Space-Time continuum), 信区: Programming
标 题: 100伪币悬赏:CNN这个东西本质上处理不了形变
发信站: BBS 未名空间站 (Wed Nov 22 00:14:51 2017, 美东)
物体大形变 只能通过 data augmentation 进行题海战术 记住所有变化
物体形变 本身是连续变化 Lagrangian view
但是变成图像 用像素描述物体 就是 Eulerian view,所有的基本距离度量,在像素
空间内统统失效。
举个例子: 在MNIST数据集上
很多基于deformable model的算法 可以获得非常高的准确度 用很少的训练样本 完全
不用data augmentation
如果用DNN,那就非得data augmentation 才能获胜
100伪币悬赏 ---------------------------------------------
图像是一个三角形的瀑布从上而下流
两个任务
1)请设计一个多入单出的DNN regre... 阅读全帖 |
|
c*******y 发帖数: 1630 | 6 ok, now I am pretty sure it's not useful.
by the time you find your "good" regressors.
market changes.
feature |
|
L****8 发帖数: 3938 | 7 物体大形变 只能通过 data augmentation 进行题海战术 记住所有变化
物体形变 本身是连续变化 Lagrangian view
但是变成图像 用像素描述物体 就是 Eulerian view,所有的基本距离度量,在像素
空间内统统失效。
举个例子: 在MNIST数据集上
很多基于deformable model的算法 可以获得非常高的准确度 用很少的训练样本 完全
不用data augmentation
如果用DNN,那就非得data augmentation 才能获胜
100伪币悬赏 ---------------------------------------------
图像是一个三角形的瀑布从上而下流
两个任务
1)请设计一个多入单出的DNN regressor,从图片计算瀑布下尖位置(决定整个形状)
2)请设计一个generative DNN, 根据下尖位置生成瀑布图片
请用前64张图训练 后64张图测试
不用data augmentation 不用transfer learning
能做出来的 我出100伪币 估计google会给你100w美元 |
|
i*****o 发帖数: 42 | 8 我在做panel data regression的时候加了一个很显著的independent
variable,但是R^2只有很小的改变,为什么?
其他的independent variable的的estimate和t-stat也没有大的变化
应该不是multicollearity的问题吧
是不是因为我的是fixed effect model,里面用了很多time series
dummy , cross sectional dummy,因为regressors太多了,所以加一个即
使自己很显著的变量R^2变化也不会大?
请指教,多谢! |
|
r**d 发帖数: 21 | 9 it shouldn't be multicollnearity, because if that's the case, R-sq should
increase a lot.
how much R-sq increase after adding a new regressor depends on the partial
correlation coefficient. if i remember correctly, partial correlatin coeff. is
t^2/(t^2+d.o.f.). if the degree of freedom is too high relative to t-value,
the R-sq will not increase much even if t is very large. Greene's book has
several sections on this topic. |
|
t****g 发帖数: 715 | 10 I think he means expansion around parameters.Though second order may appear,
he considers the second order term as a new regressor, thus gets over "
nonlinearity". You can still consider this as "nonlinearity" of course, it
is just an issue of terminology.
The problem of his original argument might be that it only works around
certain countable points with enough observations. In order to regress
around all the possible points, he would have to do Taylor expansion around
all these points, which |
|
t****g 发帖数: 715 | 11 I guess you can have whatever time series model you like. For the format you
want, why not construct a linear model as follows, where the variance of
shocks depends on regressors:
y_t= x_t*theta1 + x_t*u_t, where u_t ~ NID(0, sigma^2), var(x_t)=sigma_x^2.
Then this becomes a simple model with volaticity in shocks.
and |
|
f*******r 发帖数: 257 | 12 When you have significant results, even though you have highly correlated
regressors, most likely you have a large data set. Although you have
significant results, the question you should ask yourself is: given x1 and
x2 have a correlation of 0.95, do they really both need to be in the model?
If your theory says yes, then it's fine. But if x1 and x2 have a
correlation of .95, they are very likely to be measuring the same construct. |
|
a**********0 发帖数: 422 | 13 这是好的iv 差的比如可以选用某恐龙化石的某参数 它和任何其他regressor无任何关
系 但是也不适合做替代品 |
|
s*********o 发帖数: 13 | 14 Yes, I think you are right. Maybe there are several regressors have strong
correlation. Probably this is imperfect multicollinearity issue.
Thanks again |
|
b*****d 发帖数: 7166 | 15 【 以下文字转载自 Statistics 讨论区 】
发信人: biokold (kold), 信区: Statistics
标 题: regression的问题:怎么处理bad data
发信站: BBS 未名空间站 (Fri May 16 02:40:37 2014, 美东)
现在要做一个线性回归分析。数据是每5分钟记录一次的股票价格,共10年时间。
问题有
1.怎样判断数据是否是错的(比如太离谱的,负的等等)?有什么一般的方法判断吗?
2.怎么处理错的数据,直接扔掉?因为要做回归,比如regressor选为过去1天的数字,
那么就不能扔掉。这时要把错的数据改成一个猜测的数字吗?
3.有什么通用的办法引入一个权重,使得近期的数据权重更大?比如指数函数还是多项
式函数,哪个更合理?
谢谢! |
|
j****x 发帖数: 13 | 16 第二题为什么能用LASSO?regressors高度相关的话肯定不符合oracle properties的
assumptions的 |
|
d**********l 发帖数: 183 | 17 如果covariate里有time series 的regressor, response也是time series。这样的情
况下做linear regression是不是有些复杂?就我目前的理解是如果x和y都是
stationary的过程,或者如果x和y是cointegrated的,是可以直接做linear
regression只不过error的pattern 是stochastic的。
请问牛人们我的理解对吗?
如果y是stationary, x是i(1)那么是不是要把x difference 一次再对y 和diff(x)
做regression呢?
谢谢大虾的回答。 |
|
s*****n 发帖数: 2174 | 18 来自主题: Statistics版 - 问个概念。 一样, 都是模型里面的 "x".
统计背景的人, 喜欢叫 covariate 和 response.
社会科学背景的人, 喜欢叫 independent variable 和 dependent variable
计算机背景的人, 喜欢叫 feature 和 outcome.
也有人喜欢把 x 叫做 explanatory variable, predictor 或者 regressor 什么的.
都是一回事. |
|
S******y 发帖数: 1123 | 19 variable selection is an art!
stepwise is not error prone..
If you have 50+ random noise regressors, stepwise will find something
significant! |
|
f******k 发帖数: 297 | 20 find out it is just an underflow problem. didn't expect it at all since the
regressors are only about 10^4. |
|
f**n 发帖数: 401 | 21 Consider multiple regression with independent variables x1...xn and
dependent variable y.
Suppose I have 12,000 observations. I randomly split the data into training
(70%) and testing(30%).
N, the total number of candidate variables is around 50. That is, I can in
the worst case fit my model to be:
model: y = x1, ... xn
In this case my adjusted R-square is around 60%
Based on business rules, I can segment the data into smaller pieces, e.g.,
each segment has 500, 200 or even 100 observations. If |
|
d*******o 发帖数: 493 | 22 How about do a lift curve and have a look? |
|
f**n 发帖数: 401 | 23 I do not know how lift curve can be done in my case: my problem is a
multiple regression, not a logistic one.
If I understand you correctly, I should really use hold-out validation data
to measure how the model works. |
|
l***a 发帖数: 12410 | 24 I think first a power analysis needs to be done to decide the minimum sample
size, I am sure you know it :) Then, I think if you pay real attention to
take care of the multicollinearity and the number of selected predictors, it
will give you a very good chance to avoid overfitting. But remember there
is a rule of thumb that on average one predictor should have at least 10 obs
. Although I don't practically keep this rule all the time, it's still good
to keep it in mind.
training |
|
l*********s 发帖数: 5409 | 25 ridge regression, imhpw, is a facilitating method, reducing of variance
allows you to say something about effectors in cases of high-
dimension/high multicolinearity.
PCA can achieve variable reduction/selection, but you have to live up
with the new set of compositive regressors.
think
will
bias-
selection. |
|
G*****m 发帖数: 222 | 26 1。ols=mle?
Normality (?). It is sometimes additionally assumed that the errors have
normal distribution conditional on the regressors:[4]
see:
http://en.wikipedia.org/wiki/Ordinary_least_squares
2。如果ppl不愿意接收他们 highly depressed的solution:
OLS, 得到error term。plot error against score.regress error on score, 检测是
否相关。解决法子,我也不清楚...要看你的X, literature。不过好像bootstrapping
通吃?
啊! |
|
f***c 发帖数: 301 | 27 最简单的比如 Yi= a + bXi + ei
这样一个线性模型 如果假设 Xi 也是一个random variable,服从正态分布不知道mean
和variance, 怎么estimate a,b还有X的mean,variance,用ML可以么
找了找相关的文献 似乎有人把这个算作measurement error model可是又没有讲怎么
estimate,希望有人可以指点一下 谢谢 |
|
|
f***c 发帖数: 301 | 29 十分谢谢回复!!
我也咨询了下econometrics的老师 实际X也是random term 但是我们能够观察到X的值
可不可以这样理解 在一般的假设中 如果观察到的x值和实际值有偏差 比如实际值是x+
error, 那么这个error已经被e考虑到了 |
|
D******n 发帖数: 2836 | 30 Ya, everything is random. We usually think X is accurate and error of y
comes from measurement error of Y or unaccounted factors. Of course
there
are always errors when measuring X. But as you say, error of measuring X
can
be thought to be transferred to e.
But actually if you think about how you are going to use this model, you
don
't need to bother to have concern on X. Yes there are measurement error
on X
but it is likely when you use this model , you will still have the
measurement error on ... 阅读全帖 |
|
l*********s 发帖数: 5409 | 31 Say, a regressor X enters into the regression model in the quadratic form
a*(X-b)^2 ,
is there some measure similar to R square in linear models that reflect the contribution of X to the explained variance of dependent variable? Thanks a bunch. |
|
o********p 发帖数: 127 | 32 my 2 cents:
1) use stepwise selection to select variables, etc.
2) can also consider other variable selection methods, such as PCA and, in
particularily some regularization method (to address for the
multicollinerity issues among regressors). This can be easy done in R,
however, SAS should have similar procesures (lots of big cow here in this
board...)
3) If you are doing classificaiton (y is categorical), you may (and should,
actualy) consider ROC curve, which is quite practical and most common... 阅读全帖 |
|
f**********0 发帖数: 399 | 33 要用X1,X2来预测一个Y,但是X1和X2是看不见的,需要用已有的数据和ML的方法来
jointly estimate. 问题是,用这个预测出来的X1,X2不可以直接拿来regressor Y,而
是要3个一起jointly estimate.为什么?是因为X1,X2预测值的error问题?如果3个一
起jointly estimate,那么我怎么知道X1,X2对Y的影响?
谢谢。 |
|
p********2 发帖数: 9939 | 34 我想run一个regression allowing for error correlation within certain clusters.
比如说,year 和 firm。
proc genmod的一个选项是repeated subject。看了看好像这就是用来specify一个
cluster where errors are correlated within this cluster.但是我要specify两个
clusters。它要我写成year*firm。这是什么意思呢?为什么有*。表示interaction?
if yes,怎么个interaction法呵?如果有三个cluster呢?
还有一个问题,我得model不能converge
WARNING: The negative of the Hessian is not positive definite. The
convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit i... 阅读全帖 |
|
c***z 发帖数: 6348 | 35 ## build data frame
work <- c(12, 14, 4, 16, 12, 20, 25, 8, 24, 28, 4, 15)
edu <- c(6,3,8,8,4,4,1,3,12,9,11,4)
income <- c(34.7, 17.9, 22.7, 63.1, 33.0, 41.4, 20.7, 14.6, 97.3, 72.1, 49.1
, 52.0)
studay.df <- data.frame(cbind(work, edu, income))
## linear model
model_3 <- lm(income ~ ., data = studay.df) # OLS
summary_table <- data.frame(summary(model_3)$coefficients)
colnames(summary_table) <- c("coef", "std.error", "t_value", "p_value")
summary_table$regressor <- row.names(summary_table)
s... 阅读全帖 |
|
c***z 发帖数: 6348 | 36 A side question, how does K-mean decide the distance if some regressors are
binary? |
|
c***z 发帖数: 6348 | 37 I know.
This is the first time I saw time/date variable as regressor.Do you want
annual increasing of salary? |
|
c***z 发帖数: 6348 | 38 maybe you can email the authors of the package
if I understood it right, the party package splits over the response
variable, while cart algorithm (e.g. rpart) splits over the regressors, this
might be related to your problem...
or, if you trust me, you can email your data and problem to me |
|
f****s 发帖数: 3078 | 39 比如我有200 个features, 10000个observations,怎样快速来确定哪些features要当
作regressors 呢?
谢谢 |
|
f****s 发帖数: 3078 | 40 ridge不能选择regressor吧
不过lasso倒是可以,但是没有grouping effect
elastic net不错,当时没想到 |
|
m******u 发帖数: 11 | 41 就按照这页的写法好了,beta和mu不是coefficient吗,不是常数吗
所谓fixed/random不是指regressor(X和Z)吗 |
|
c***z 发帖数: 6348 | 42 “…the crucial distinction between fixed and
random effects is whether the unobserved
individual effect embodies elements that ar
e correlated with the regressors in the
model, not whether these effects are
stochastic or not” [Green, 2008, p.183] |
|
b*****d 发帖数: 7166 | 43 现在要做一个线性回归分析。数据是每5分钟记录一次的股票价格,共10年时间。
问题有
1.怎样判断数据是否是错的(比如太离谱的,负的等等)?有什么一般的方法判断吗?
2.怎么处理错的数据,直接扔掉?因为要做回归,比如regressor选为过去1天的数字,
那么就不能扔掉。这时要把错的数据改成一个猜测的数字吗?
3.有什么通用的办法引入一个权重,使得近期的数据权重更大?比如指数函数还是多项
式函数,哪个更合理?
谢谢! |
|
|
f*******r 发帖数: 257 | 45 Since no one asks econometric questions yet, I'll raise one:
Why in SUR, if the regressors are the same, the coefficient estimates and
their standard error estimates are the same as OLS estimates? I know they
are in standard text books, but why, intuitively? Shouldn't the correlation
between equations change something? |
|