关于regressor的讨论汇总 - 话题女王

全部话题 - 话题: regressor

a****m
发帖数: 693

来自主题: Statistics版 - 怎样用R 来提取 micrarray 中的regressor variable 的值? (转载)

【以下文字转载自 Biology 讨论区】
发信人: affymm (no), 信区: Biology
标题: 怎样用R 来提取 micrarray 中的regressor variable 的值?
发信站: BBS 未名空间站 (Mon Nov 2 11:14:36 2009, 美东)
我们用Lmfit 来做回归，原理上是所有的点都有Signal = u + Line + Treat + e, 的
linear model, 我们只知道signal的值，其他的variable的值只能通过这个linear Model得
到，问题是怎样在R中把细胞系（line）和处理效应（treat）的值取从Linear model得出来，只
知道怎样取 residue, like fit=lmFit(mymean,design),
Res=residuals(fit,mymean),不知道这个
LmFit对mix model合适不合适？ Signal=u+Line+Treat+Line*Treat+e (mixed model)?
谢谢

发帖数: 1

来自主题: Joke版 - 求助各路大仙，modeling问题，多谢了！

sorry...That previous tensorflow link was a tutorial on classifier (i.e.
predict to
which category an element belongs)....
This one below is on regressor (i.e. predict the next value), which suits
your need more directly...
https://www.tensorflow.org/get_started/input_fn#a_neural_network_model_for_
boston_house_values (tensorflow neural network regressor)
that matlab link is also on future value prediction as well...
https://www.youtube.com/watch?v=4tP1JA6aH5o (matlab neural network
regresso... 阅读全帖

t****g
发帖数: 715

来自主题: Economics版 - 对于OLS，以下几种情况大家通常怎么处理？

Do whatever is needed (add/drop regressors, drop obs., scale/transform
regressors, etc) to get "good" regression outcome: as long as referees can
be convinced. Is not this what many empirical people are doing?

not
c

f*********y
发帖数: 376

来自主题: Statistics版 - goodness of fit of logit model

用train data run一个logit regression，假设发现psedudo R2很小，LR test 也说明
regressors是jointly significance的，z stat 也说明每个regressor很重要。但是，
有没有可能我用它去predict实际的数据，发现这个model的sensitivity很差，估计也
就10％，虽然sepcifity很高，但是这个model里的高的sensitivty很重要。
若这样的是可能的，我们能说找个模型很好吗。

L****8
发帖数: 3938

来自主题: Military版 - 100伪币悬赏：CNN这个东西本质上处理不了形变 (转载)

【以下文字转载自 Programming 讨论区】
发信人: Liber8 (Space-Time continuum), 信区: Programming
标题: 100伪币悬赏：CNN这个东西本质上处理不了形变
发信站: BBS 未名空间站 (Wed Nov 22 00:14:51 2017, 美东)
物体大形变只能通过 data augmentation 进行题海战术记住所有变化
物体形变本身是连续变化 Lagrangian view
但是变成图像用像素描述物体就是 Eulerian view，所有的基本距离度量，在像素
空间内统统失效。
举个例子：在MNIST数据集上
很多基于deformable model的算法可以获得非常高的准确度用很少的训练样本完全
不用data augmentation
如果用DNN，那就非得data augmentation 才能获胜
100伪币悬赏 ---------------------------------------------
图像是一个三角形的瀑布从上而下流
两个任务
1）请设计一个多入单出的DNN regre... 阅读全帖

c*******y
发帖数: 1630

来自主题: Stock版 - 公司一直想开发出能预测股市的软件

ok, now I am pretty sure it's not useful.
by the time you find your "good" regressors.
market changes.

feature

L****8
发帖数: 3938

来自主题: Programming版 - 100伪币悬赏：CNN这个东西本质上处理不了形变

物体大形变只能通过 data augmentation 进行题海战术记住所有变化
物体形变本身是连续变化 Lagrangian view
但是变成图像用像素描述物体就是 Eulerian view，所有的基本距离度量，在像素
空间内统统失效。
举个例子：在MNIST数据集上
很多基于deformable model的算法可以获得非常高的准确度用很少的训练样本完全
不用data augmentation
如果用DNN，那就非得data augmentation 才能获胜
100伪币悬赏 ---------------------------------------------
图像是一个三角形的瀑布从上而下流
两个任务
1）请设计一个多入单出的DNN regressor，从图片计算瀑布下尖位置(决定整个形状)
2）请设计一个generative DNN，根据下尖位置生成瀑布图片
请用前64张图训练后64张图测试
不用data augmentation 不用transfer learning
能做出来的我出100伪币估计google会给你100w美元

i*****o
发帖数: 42

来自主题: Economics版 - Question about Panel data regression

我在做panel data regression的时候加了一个很显著的independent
variable,但是R^2只有很小的改变，为什么？
其他的independent variable的的estimate和t-stat也没有大的变化
应该不是multicollearity的问题吧
是不是因为我的是fixed effect model,里面用了很多time series
dummy , cross sectional dummy，因为regressors太多了，所以加一个即
使自己很显著的变量R^2变化也不会大？
请指教，多谢!

r**d
发帖数: 21

来自主题: Economics版 - Question about Panel data regression

it shouldn't be multicollnearity, because if that's the case, R-sq should
increase a lot.
how much R-sq increase after adding a new regressor depends on the partial
correlation coefficient. if i remember correctly, partial correlatin coeff. is
t^2/(t^2+d.o.f.). if the degree of freedom is too high relative to t-value,
the R-sq will not increase much even if t is very large. Greene's book has
several sections on this topic.

t****g
发帖数: 715

来自主题: Economics版 - 非线性计量干嘛搞那么复杂啊。。

I think he means expansion around parameters.Though second order may appear,
he considers the second order term as a new regressor, thus gets over "
nonlinearity". You can still consider this as "nonlinearity" of course, it
is just an issue of terminology.
The problem of his original argument might be that it only works around
certain countable points with enough observations. In order to regress
around all the possible points, he would have to do Taylor expansion around
all these points, which

t****g
发帖数: 715

来自主题: Economics版 - 求助time-series models

I guess you can have whatever time series model you like. For the format you
want, why not construct a linear model as follows, where the variance of
shocks depends on regressors:
y_t= x_t*theta1 + x_t*u_t, where u_t ~ NID(0, sigma^2), var(x_t)=sigma_x^2.
Then this becomes a simple model with volaticity in shocks.

and

f*******r
发帖数: 257

来自主题: Economics版 - 请教一个multicollinearity的问题

When you have significant results, even though you have highly correlated
regressors, most likely you have a large data set. Although you have
significant results, the question you should ask yourself is: given x1 and
x2 have a correlation of 0.95, do they really both need to be in the model?
If your theory says yes, then it's fine. But if x1 and x2 have a
correlation of .95, they are very likely to be measuring the same construct.

a**********0
发帖数: 422

来自主题: Economics版 - 谁能白话解释什么是instrumental variable

这是好的iv 差的比如可以选用某恐龙化石的某参数它和任何其他regressor无任何关
系但是也不适合做替代品

s*********o
发帖数: 13

来自主题: Quant版 - one statistic interview question

Yes, I think you are right. Maybe there are several regressors have strong
correlation. Probably this is imperfect multicollinearity issue.
Thanks again

b*****d
发帖数: 7166

来自主题: Quant版 - regression的问题：怎么处理bad data (转载)

【以下文字转载自 Statistics 讨论区】
发信人: biokold (kold), 信区: Statistics
标题: regression的问题：怎么处理bad data
发信站: BBS 未名空间站 (Fri May 16 02:40:37 2014, 美东)
现在要做一个线性回归分析。数据是每5分钟记录一次的股票价格，共10年时间。
问题有
1.怎样判断数据是否是错的（比如太离谱的，负的等等）？有什么一般的方法判断吗？
2.怎么处理错的数据，直接扔掉？因为要做回归，比如regressor选为过去1天的数字，
那么就不能扔掉。这时要把错的数据改成一个猜测的数字吗？
3.有什么通用的办法引入一个权重，使得近期的数据权重更大？比如指数函数还是多项
式函数，哪个更合理？
谢谢！

j****x
发帖数: 13

来自主题: Quant版 - 刚面了一家prop shop，基本功太差，活该被虐了

第二题为什么能用LASSO？regressors高度相关的话肯定不符合oracle properties的
assumptions的

d**********l
发帖数: 183

来自主题: Statistics版 - 请教牛人们关于time series 的 linear regression 问题

如果covariate里有time series 的regressor， response也是time series。这样的情
况下做linear regression是不是有些复杂？就我目前的理解是如果x和y都是
stationary的过程，或者如果x和y是cointegrated的，是可以直接做linear
regression只不过error的pattern 是stochastic的。
请问牛人们我的理解对吗？
如果y是stationary， x是i（1）那么是不是要把x difference 一次再对y 和diff（x)
做regression呢？
谢谢大虾的回答。

s*****n
发帖数: 2174

来自主题: Statistics版 - 问个概念。

一样, 都是模型里面的 "x".
统计背景的人, 喜欢叫 covariate 和 response.
社会科学背景的人, 喜欢叫 independent variable 和 dependent variable
计算机背景的人, 喜欢叫 feature 和 outcome.
也有人喜欢把 x 叫做 explanatory variable, predictor 或者 regressor 什么的.
都是一回事.

S******y
发帖数: 1123

来自主题: Statistics版 - 请问关于LOGISTIC REGRESSION FORWARD VS BACKWARD

variable selection is an art!
stepwise is not error prone..
If you have 50+ random noise regressors, stepwise will find something
significant!

f******k
发帖数: 297

来自主题: Statistics版 - Question on SAS estimation result

find out it is just an underflow problem. didn't expect it at all since the
regressors are only about 10^4.

f**n
发帖数: 401

来自主题: Statistics版 - sample size vs. number of regressors

Consider multiple regression with independent variables x1...xn and
dependent variable y.
Suppose I have 12,000 observations. I randomly split the data into training
(70%) and testing(30%).
N, the total number of candidate variables is around 50. That is, I can in
the worst case fit my model to be:
model: y = x1, ... xn
In this case my adjusted R-square is around 60%
Based on business rules, I can segment the data into smaller pieces, e.g.,
each segment has 500, 200 or even 100 observations. If

d*******o
发帖数: 493

来自主题: Statistics版 - sample size vs. number of regressors

How about do a lift curve and have a look?

f**n
发帖数: 401

来自主题: Statistics版 - sample size vs. number of regressors

I do not know how lift curve can be done in my case: my problem is a
multiple regression, not a logistic one.
If I understand you correctly, I should really use hold-out validation data
to measure how the model works.

l***a
发帖数: 12410

来自主题: Statistics版 - sample size vs. number of regressors

I think first a power analysis needs to be done to decide the minimum sample
size, I am sure you know it :) Then, I think if you pay real attention to
take care of the multicollinearity and the number of selected predictors, it
will give you a very good chance to avoid overfitting. But remember there
is a rule of thumb that on average one predictor should have at least 10 obs
. Although I don't practically keep this rule all the time, it's still good
to keep it in mind.

training

l*********s
发帖数: 5409

来自主题: Statistics版 - 找工作总结 [下]

ridge regression, imhpw, is a facilitating method, reducing of variance
allows you to say something about effectors in cases of high-
dimension/high multicolinearity.
PCA can achieve variable reduction/selection, but you have to live up
with the new set of compositive regressors.

think
will
bias-
selection.

G*****m
发帖数: 222

来自主题: Statistics版 - 如果dep variable严重skewed，如何做ordinal regression？

1。ols=mle?
Normality (?). It is sometimes additionally assumed that the errors have
normal distribution conditional on the regressors:[4]
see：
http://en.wikipedia.org/wiki/Ordinary_least_squares
2。如果ppl不愿意接收他们 highly depressed的solution:
OLS, 得到error term。plot error against score.regress error on score, 检测是
否相关。解决法子，我也不清楚...要看你的X， literature。不过好像bootstrapping
通吃？

啊！

f***c
发帖数: 301

来自主题: Statistics版 - 请问各位大牛当regressor/covariates 是random的时候怎么estimate啊？

最简单的比如 Yi= a + bXi + ei
这样一个线性模型如果假设 Xi 也是一个random variable,服从正态分布不知道mean
和variance, 怎么estimate a,b还有X的mean,variance，用ML可以么
找了找相关的文献似乎有人把这个算作measurement error model可是又没有讲怎么
estimate,希望有人可以指点一下谢谢

D******n
发帖数: 2836

来自主题: Statistics版 - 请问各位大牛当regressor/covariates 是random的时候怎么estimate啊？

Usually the distribution of Xi's does not matter for estimate a and b.
http://en.wikipedia.org/wiki/Regression_dilution#The_case_of_a_

f***c
发帖数: 301

来自主题: Statistics版 - 请问各位大牛当regressor/covariates 是random的时候怎么estimate啊？

十分谢谢回复！！
我也咨询了下econometrics的老师实际X也是random term 但是我们能够观察到X的值
可不可以这样理解在一般的假设中如果观察到的x值和实际值有偏差比如实际值是x+
error, 那么这个error已经被e考虑到了

D******n
发帖数: 2836

来自主题: Statistics版 - 请问各位大牛当regressor/covariates 是random的时候怎么estimate啊？

Ya, everything is random. We usually think X is accurate and error of y
comes from measurement error of Y or unaccounted factors. Of course
there
are always errors when measuring X. But as you say, error of measuring X
can
be thought to be transferred to e.
But actually if you think about how you are going to use this model, you
don
't need to bother to have concern on X. Yes there are measurement error
on X
but it is likely when you use this model , you will still have the
measurement error on ... 阅读全帖

l*********s
发帖数: 5409

来自主题: Statistics版 - contribution to explained variance in polynomial regressio

Say, a regressor X enters into the regression model in the quadratic form
a*(X-b)^2 ,
is there some measure similar to R square in linear models that reflect the contribution of X to the explained variance of dependent variable? Thanks a bunch.

o********p
发帖数: 127

来自主题: Statistics版 - 问一个简单的问题：一个data set有100个变量（列），每列有1000个观测值，怎么找每个变量里面的outlier？

my 2 cents:
1) use stepwise selection to select variables, etc.
2) can also consider other variable selection methods, such as PCA and, in
particularily some regularization method (to address for the
multicollinerity issues among regressors). This can be easy done in R,
however, SAS should have similar procesures (lots of big cow here in this
board...)
3) If you are doing classificaiton (y is categorical), you may (and should,
actualy) consider ROC curve, which is quite practical and most common... 阅读全帖

f**********0
发帖数: 399

来自主题: Statistics版 - 帮忙解答一个模型上的问题

要用X1,X2来预测一个Y，但是X1和X2是看不见的，需要用已有的数据和ML的方法来
jointly estimate. 问题是，用这个预测出来的X1,X2不可以直接拿来regressor Y,而
是要3个一起jointly estimate.为什么？是因为X1,X2预测值的error问题？如果3个一
起jointly estimate，那么我怎么知道X1,X2对Y的影响？
谢谢。

p********2
发帖数: 9939

来自主题: Statistics版 - 请问proc genmod， sas问题

我想run一个regression allowing for error correlation within certain clusters.
比如说，year 和 firm。
proc genmod的一个选项是repeated subject。看了看好像这就是用来specify一个
cluster where errors are correlated within this cluster.但是我要specify两个
clusters。它要我写成year*firm。这是什么意思呢？为什么有*。表示interaction？
if yes,怎么个interaction法呵？如果有三个cluster呢？
还有一个问题，我得model不能converge
WARNING: The negative of the Hessian is not positive definite. The
convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit i... 阅读全帖

c***z
发帖数: 6348

来自主题: Statistics版 - 哪个帅哥来帮我做道统计学的题啊？

## build data frame
work <- c(12, 14, 4, 16, 12, 20, 25, 8, 24, 28, 4, 15)
edu <- c(6,3,8,8,4,4,1,3,12,9,11,4)
income <- c(34.7, 17.9, 22.7, 63.1, 33.0, 41.4, 20.7, 14.6, 97.3, 72.1, 49.1
, 52.0)
studay.df <- data.frame(cbind(work, edu, income))
## linear model
model_3 <- lm(income ~ ., data = studay.df) # OLS
summary_table <- data.frame(summary(model_3)$coefficients)
colnames(summary_table) <- c("coef", "std.error", "t_value", "p_value")
summary_table$regressor <- row.names(summary_table)
s... 阅读全帖

c***z
发帖数: 6348

来自主题: Statistics版 - Sample size for clustering analysis

A side question, how does K-mean decide the distance if some regressors are
binary?

c***z
发帖数: 6348

来自主题: Statistics版 - Correlated independent variable

I know.
This is the first time I saw time/date variable as regressor.Do you want
annual increasing of salary?

c***z
发帖数: 6348

来自主题: Statistics版 - conditional tree questions??

maybe you can email the authors of the package
if I understood it right, the party package splits over the response
variable, while cart algorithm (e.g. rpart) splits over the regressors, this
might be related to your problem...
or, if you trust me, you can email your data and problem to me

f****s
发帖数: 3078

来自主题: Statistics版 - 请问OLS怎样选择feature sets？

比如我有200 个features， 10000个observations，怎样快速来确定哪些features要当
作regressors 呢？
谢谢

f****s
发帖数: 3078

来自主题: Statistics版 - 请问OLS怎样选择feature sets？

ridge不能选择regressor吧
不过lasso倒是可以，但是没有grouping effect
elastic net不错，当时没想到

m******u
发帖数: 11

来自主题: Statistics版 - 弱问到底什么是fixed/random effect model?

就按照这页的写法好了，beta和mu不是coefficient吗，不是常数吗
所谓fixed/random不是指regressor(X和Z)吗

c***z
发帖数: 6348

来自主题: Statistics版 - 弱问到底什么是fixed/random effect model?

“…the crucial distinction between fixed and
random effects is whether the unobserved
individual effect embodies elements that ar
e correlated with the regressors in the
model, not whether these effects are
stochastic or not” [Green, 2008, p.183]

b*****d
发帖数: 7166

来自主题: Statistics版 - regression的问题：怎么处理bad data

现在要做一个线性回归分析。数据是每5分钟记录一次的股票价格，共10年时间。
问题有
1.怎样判断数据是否是错的（比如太离谱的，负的等等）？有什么一般的方法判断吗？
2.怎么处理错的数据，直接扔掉？因为要做回归，比如regressor选为过去1天的数字，
那么就不能扔掉。这时要把错的数据改成一个猜测的数字吗？
3.有什么通用的办法引入一个权重，使得近期的数据权重更大？比如指数函数还是多项
式函数，哪个更合理？
谢谢！

w******e
发帖数: 1621

来自主题: DataSciences版 - feature selection的方法求教

regressor

f*******r
发帖数: 257

来自主题: _Applied_Econometrics版 - First question

Since no one asks econometric questions yet, I'll raise one:
Why in SUR, if the regressors are the same, the coefficient estimates and
their standard error estimates are the same as OLS estimates? I know they
are in standard text books, but why, intuitively? Shouldn't the correlation
between equations change something?

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天