第3页 - 关于univariate的讨论汇总 - 话题女王

全部话题 - 话题: univariate

a****t
发帖数: 1007

一个简单的程序，老做不出来各个因素的Interaction effect。不知道为什么。请点拨
一下。谢谢。
libname mysas 'D:\mysas';
data mysas.beet;
do harvest=1 to 2;
do rep=1 to 6;
do variety=1 to 6;
input col prod @@;
output;
end;
end;
end;
cards;
3 19.1 6 18.3 5 19.6 1 18.6 2 18.2 4 18.5
6 18.1 2 19.5 4 17.6 3 18.7 1 18.7 5 19.9
1 18.1 5 20.2 6 18.5 4 20.1 3 19.6 2 19.2
2 19.1 3 18.8 1 18.7 5 20.2 4 18.6 6 18.5
4 17.5 1 18.1 2 18.7 6 18.2 5 20.4 3 18.5
5 17.7 4 1... 阅读全帖

P****D
发帖数: 11146

来自主题: Statistics版 - 请问on-site SAS 上机都考啥啊？

啥职位？如果是sas programmer，那估计也就是proc means, freq, univariate,
gplot...

g******6
发帖数: 772

来自主题: Statistics版 - 急，ENTRY LEVEL SAS PROGRAMER明天On-site，需要注意什么啊

等了好久终于有个On-site了，是一个朋友废了好大劲一直帮忙才得到的机会。
想好好把握，有在SAS PROGRAMER的网站上看INTERVIEW QUESITON。
除此之外还有什么东西需要搞得特别懂吗？如果现在要CODE一般考什么？是MERGE DATA
,PROC FREQ, PROC UNIVARIATE, PROC GRAPH这些？
还会问那些理论吗？比如PDV？
还有这个公司是个CRO，我没有CLINICAL的DATA ANALYSIS经验，我也直接在电面里很明
确地说了，朋友说经理好办，但是BOSS很PICKY特别是对男生。
求帮助！

c****s
发帖数: 63

来自主题: Statistics版 - 请问：一个pre-screening的问题

在这个model里:
Y= death
X= X1, X2......X700
要在建model之前做pre-screening，请问大家都用什么办法啊?我的认为：
1. proc univariate 找p-value <0.25
2. proc surveyselect (不太清楚)
不知大家用什么办法的啊，菜鸟一个，望大家多多指点，万分感谢

k*******y
发帖数: 132

来自主题: Statistics版 - first offer

1. rp is important, no rp no reference...
2. be easy-going and confident
3. be familiar with base SAS(little sas book is not enough)
frequently asked questions:
proc univariate/means/transpose/report/tabulate/
set/in=/merge/array/retain/substr/translate
ods system/
index/sql
questions about large dataset
4. be familiar with the models you write on your resume

b*****r
发帖数: 359

来自主题: Statistics版 - SAS 结果处理的问题

有360个表格（samples），每个表格有5个变量要处理。
每个变量要用4个模型，然后分别作三个one-sidede检测（t-test, sign test, rank
test),
每个检测要在两个significant level下判断是 reject 还是 fail to reject。
最后要做比个报表，统计reject 和 fail to reject数量。
假设360个表格名称为; sample1, sample2, ..., sample360;
5个变量为： v1, v2, v3, v4, v5;
4个模型是：mod1, mod2, mod3, mod4;
我用macro, 和 proc univariate 做了 360 X 5 X 4 X 3 = 21600个检测。
问题1：怎样快速高效地将这21600个检测结果，做成如下图表格。
为题2：在sas里面怎样实现自动判断是 reject 还是 fail to reject？

R******d
发帖数: 1436

来自主题: Statistics版 - sas直方图+拟合power-law

我想用sas画个直方图，并拟合power-law distribution，应该怎么做？
之前用正态分布proc univariate直接可以出来的。
多谢了。

w*********8
发帖数: 70

来自主题: Statistics版 - 怎么判别一个分布是不是NORMAL的？？？

proc univariate

h***i
发帖数: 3844

来自主题: Statistics版 - bivarate normal distribution的可传递性

。。。。。。
A random vector X = (X1, …, Xk)′ is said to have the multivariate normal
distribution if it satisfies the following equivalent conditions
[1]:
* Every linear combination of its components Y = a1X1 + … + akXk is
normally distributed. That is, for any constant vector a ∈ Rk, the random
variable Y = a′X has a univariate normal distribution.

d******e
发帖数: 7844

来自主题: Statistics版 - bivarate normal distribution的可传递性

Covariance Matrix Rank deficient.
举个简单的例子，x服从N(0,1), y = x，那么(x,y)服从正态分布。但是这个协方差矩
阵rank只有1。无法求逆，自然也无法算density。其实Rank deficient的情况很多，常
用的PCA的假设就是covariance是low rank的。
正态分布的定义是
A random vector is said to be multivariate normally distributed if every
linear combination of its components has a univariate normal distribution.
并不是由density function来定义的。

k*****u
发帖数: 1688

来自主题: Statistics版 - 问一个简单的问题：一个data set有100个变量（列），每列有1000个观测值，怎么找每个变量里面的outlier？

如果用boxplot的话，proc univariate里面的一个个的变量来看。太多了
有没有那个proc可以对所有的变量看的？
另外，model y=x; 假如有100个x，名字都不一样，怎么放进model后面，得一个个的敲
名字进去么？
谢谢

o****o
发帖数: 8077

来自主题: Statistics版 - R的循环语句该怎么用。

for this particular question, SAS actually is pretty handy:
*******************************;
data test;
do id=1 to 20000;
x=rannor(0);
output;
end;
run;
ods select none;
proc surveyselect data=test out=samp method=srs
seed=93759437 sampsize=1000 rep=100 ;
run;
ods select all;
proc univariate data=samp noprint;
by replicate;
histogram x/outhistogram=hist outkernel=kernel noplot;
run;
proc sgplot data=hist;
series x=_MIDPT_ y=_OBSPCT_ /group=replica... 阅读全帖

l*********s
发帖数: 5409

来自主题: Statistics版 - 问个傻问题，如何做三个variable的correlation

Down to the bottom, pairwise correlation corresponds to the univariate
regression model.

b********i
发帖数: 1252

来自主题: Statistics版 - 请问各位能人一个data transformation的问题

问题是这样的：
一组数据大约几百个
highly left skewed
all above 0
我尝试用各种分布去拟合
没有合适的
log transform之后也还是left skewed
用各种分布去拟合
没有一个合适因为p都很小
我想请问大家还有什么transform的方法
可以让我这组highly left skewed的数据
能够用normal或者gammar或者其他的什么分布来描述
我用的是sas的proc univariate histgram/normal weibull gamma...
真是多谢了！

j******n
发帖数: 2206

来自主题: Statistics版 - sas help on creating histogram

I used proc univariate.
the log reads:
error: device was not specified;
error: unable to load graphics.
I'm running sas in unix.
thanks.

j******n
发帖数: 2206

来自主题: Statistics版 - sas help on creating histogram

I used proc univariate.
the log reads:
error: device was not specified;
error: unable to load graphics.
I'm running sas in unix.
thanks.

j******n
发帖数: 2206

来自主题: Statistics版 - proc univariate的lst output打印出来很难看怎么办？

一行居然放不下。经常很短的一行居然要弄成两行很难看。
这个在sas里怎么调整阿？又不能动font.
还有其他类似的好多lst 经常打印出来无法阅读。大家是怎么弄得？

w*******9
发帖数: 1433

来自主题: Statistics版 - need help for repeated measures ANOVA

interesting question! Usually we only test whether the treatment mean or the
variance are the same. But here we need to compare different devices which
necessarily involves both bias (kind of mean) and variance. Consider the
simple case where there are two univariate data sets, with the first one
being N(a1,s1) and the second one N(a2,s2). How do we test a1^2+s1=a2^2+s2?

k*******a
发帖数: 772

来自主题: Statistics版 - 神奇的proc means

可以试试 proc univariate

D******n
发帖数: 2836

来自主题: Statistics版 - 神奇的proc means

means or univariate?
how to write the step?

i****e
发帖数: 46

来自主题: Statistics版 - 感慨一下，学校的知识在实际工作中好苍白，求教两个困惑我许久的实际问题

1. 没做过第一个，不过要是我有这个concern的话，就给impression做个outlier分析
，去掉那些极端小的size。
2。要做missing value check，有太多missing的variable要扔掉，其余的要
imputation，有些变量要做transform， convert continuous variable to
categorical/dummy variable。可以做variable clustering。然后 univariate
analysis，然后stepwise logistic regression。

G*******s
发帖数: 10605

来自主题: Statistics版 - 请教一个SAS问题：怎样平均地把obs从小到大分成几个Group？

用proc univariate里面输出的percentile值来做bin，我们大多时候都分成4组(
quartiles)做分析，稍微变化下就能分N组

m****r
发帖数: 237

来自主题: Statistics版 - 哪位能帮忙推荐一些time series的书，谢谢了～

Time Series Analysis - Univariate and Multivariate Methods second edition by
William W.S. Wei
This book has a lot of examples, plus, it is not that difficult, from my
point of view.

p***l
发帖数: 1775

来自主题: Statistics版 - 一个画box plot的问题

proc univariate 里可以用
var A
by B C
来画BC 2个factor 所有level组合的 box plot
但是出来的图实在太粗糙
想问问有谁知道在gplot里或者boxplot procedure里怎么实现这个功能
看到的gplot或者boxplot里一般只有plot A*B,不知道怎么把factor C加进去

R******o
发帖数: 83

来自主题: Statistics版 - Latent profile analysis in SAS or R ??????

OK. I guess the probelm is Proc FMM is OK when the response variable is a
mixture of several univariate distributions. What if the response variable
is a mixture of several multivariate distributions ?
I have Y=[X1 X2 X3] (a vector).
:(

s*******e
发帖数: 1385

来自主题: Statistics版 - on site被拒，讨论一下致命的一个问题吧

这个你要先做一个univariate plot，然后再决定cap， floor和transformation，还有
missing impute。

on

z*******4
发帖数: 8

来自主题: Statistics版 - 请教怎么用SAS做distribution fit

我想做一个distribution fit，发现proc univariate里面的distribution fit都是比
较常规的分布。我想用Dagum distribution fit，不知道有没有办法解决。
谢谢各位。
祝好。

s*******r
发帖数: 592

来自主题: Statistics版 - 气死了，这个TA是不是跟我有愁？

我的code都可以run，绝对没有问题的。比如说这次有一题是要求用PROC UNIVARIATE
测试pairwise ttest，她非要我用PROC TTEST才满意，她给我回信说我的结果确实有，
但是她不明白我为什么不用TTEST，给我扣了10分，我晕倒（明明是老师上课说的一个
trick，而且我也确定老师想考这点）。刚才问了一个抄我code的人，他直接抄我的得
分都比我高。。。。这叫什么事？我们都是女孩子，平时照面也打招呼的，还不好意思
直接撕破脸，火大！

i****f
发帖数: 979

来自主题: Statistics版 - 用univariate 求方差的95%置信区间该怎么option

RT~~~~
先拜谢了

a****g
发帖数: 8131

来自主题: Statistics版 - 用univariate 求方差的95%置信区间该怎么option

don't you get it by default?

i****f
发帖数: 979

来自主题: Statistics版 - 用univariate 求方差的95%置信区间该怎么option

对不起我写错了，是求方差的95%置信区间

s***2
发帖数: 240

来自主题: Statistics版 - 求救：如何找出造成结果不同的原因（变量）。

我可以分析甲的客户和乙的客户在不同变量的分布，找出分布显著不同的变量 as
tulipdream said
但是这是univariate的分析，我希望可以同时分析多个变量。
我google到有人用logistic model 做uplifting modeling ，似乎是我要的，但是没有
细节
如何做。
如果有人做过或知道相关文献能分享就好了。

f******u
发帖数: 250

来自主题: Statistics版 - SAS里输出stem and leaf plot用什么code

http://www.stattutorials.com/SAS/TUTORIAL-PROC-UNIVARIATE.htm

s********1
发帖数: 54

来自主题: Statistics版 - SAS里缺失observation补全的问题

data hh1;
input ID year;
cards;
1 1998
1 2001
2 1994
2 1995
2 1999
2 2001
3 1997
3 1999
3 2000
;
run;
proc univariate data=hh1;
var year;
by ID;
output out=dout max=dmax min=dmin;
run;
proc iml;
use dout;
read all into X;
do i = 1 to 3;
n=X[i,2]-X[i,3]+1;
xx_temp=repeat(i,n,1)||(X[i,3]:X[i,2])`;
xx=xx//xx_temp;
end;
print xx;
create hh1fix from xx[colname={ID year}];
append from xx;
quit;

J*****n
发帖数: 4859

来自主题: Statistics版 - 问一个弱智的问题啊about R^2

我有两个linear univariate model A & B。A 去对sample a做regression, B对sample
b作regression。
撇开别的不谈，如果比这两个fit出来的结果谁更像直线，那就是用R^2。
如果这里sample a and b的样本大小不一样。在比较的时候需要调整R^2么，还是就是
与sample size无关。
谢谢

s******h
发帖数: 539

来自主题: Statistics版 - 请问一个 testing的问题

看你样本量大不大了。大的话就用two sample z-test吧。小的话可以先做一下Shapiro
Wilk Test (univariate) 或者画个QQ plot，差不多过的去的话就two sample t-test
吧。rank test也是可以考虑的，比较robust但是power小了点。也可以考虑一下
bootstrap.

h******s
发帖数: 3420

来自主题: Statistics版 - 有这么郁闷的统计工作吗？

如果 manager 能看懂好坏我就不郁闷了。 MBA 出生的只接受 average, percentage,
连proc univariate 的图都觉得太复杂。而且时间问题，按照传统方法define, test
variable 一个星期作一两个， backward selection 一下两百个，对不懂统计的人来
讲太 efficient 了。
偶现在只好把所有的分析搞成 percentage. 错的统计还不如不搞，那么多 violation
, 搞出来的东东我不敢信。

r*****g
发帖数: 99

来自主题: Statistics版 - 求助简单SAS Code identify outlier

请问如何能够直接写Sas code identify 出 mean +/- 2 Standard deviation之外的
outlier?
我的笨方法是proc univariate 求出mean and SD 然后列出式子算，但是想看的
variable太多，这种方法比较笨，所以特此请教。

z**********i
发帖数: 12276

来自主题: Statistics版 - 准备考个精算的考试

这2天做了前面的10来道题，基本都自己解决。今天从后来看了几道题，univariate
and multivariate probablity需要看看答案。
多谢回复！

c**d
发帖数: 104

来自主题: Statistics版 - SAS 高手请帮忙

For question 1:
/* step 1: get a data object of a probplot */
ods graphics on;
proc univariate data=xxx noprint;
ods output probplot = y_probplot_data
probplot y;
run;
ods graphics off;
/* step 2: merge with the outlier data and */
/* create a new column as your data label*/
/* step 3: re-create probplot using sgplot procedure with datalabel option
to display outliers */
If you know graph template language, in step 1, beside output data object,
you can also output the predefined probplot tem... 阅读全帖

w*******n
发帖数: 469

来自主题: Statistics版 - 找risk factors怎么做

use univariate to pick up variables first

y*****w
发帖数: 1350

来自主题: Statistics版 - Gamma distribution for power analysis,请大拿指点

Because as I mentioned, SAS PROC POWER with the TWOSAMPLEWILCOXON statement
, which I used for the sample size calculation, has no NEGB option.
Actually I found I don't need PROC GENMOD in this context, because I found
that lognormal distribution better fits my data than Gamma distribution,
whereas PROC GENMOD does not have the lognormal option as a distribution. I
found the following SAS example using PROC UNIVARIATE very useful in that it
provides simple and straightforward statistics and gra... 阅读全帖

d******e
发帖数: 551

来自主题: Statistics版 - 求问~做大数据时怎样知道哪些predictor应该构造interaction term??

这么多变量，先做一下univariate的分析，然后减一减degree of freedom吧？

种~

w*******9
发帖数: 1433

来自主题: Statistics版 - Help! How to get two CDFs on the same plot in SAS

proc univariate data=ds;
var pcs;
class female;
cdfplot pcs / overlay;
run;

z**o
发帖数: 149

来自主题: Statistics版 - 求助：一个SAS小程序

一列数据，想知道每个number在整体的percentile，没找到sas里直接能做的procedure
，用macro的话：
data a1; input x;
datalines;
2
6
3
2
2
1
3
8
6
10
5
9
1
6
2
5
10
;
proc univariate data=a1 noprint;
var x;
output out=a2 pctlpre=p_ pctlpts= 1 to 100 by 1;
run;

proc sql;
create table a3 as
select *
from a1, a2 ... 阅读全帖

D********r
发帖数: 4

来自主题: Statistics版 - 求助：一个SAS小程序

After the code:
proc univariate data=a1 noprint;
var x;
output out=a2 pctlpre=p_ pctlpts= 1 to 100 by 1;
run;
Add
proc transpose data=a2 out=a3;
run;

h*********n
发帖数: 278

来自主题: Statistics版 - 通常Predictor越多AIC是不是应该越低？

至少以前我碰到的是这样的。最近一个model,把一个predictor exclude后，AIC反而降
了，为什么呢？从univariate来看，还是有明显pattern的。而且exclude的话gain也
降了好多。貌似把target cap在一个比较low的level,就恢复正常了，但是实际上不可
能cap这么低的。怎么办呢？

c********h
发帖数: 330

来自主题: Statistics版 - 通常Predictor越多AIC是不是应该越低？

AIC是一个fitting to the data和model complexity的trade-off。增加一个predictor
永远都会降低RSS，正因为如此才要penalized number of predictors in the model。
大概意思就是，exclude一个变量，AIC下降的话，说明这个变量带来的additional
information 不能compensate 1 more additional parameter.
如果你univariate看这个变量很显著，那可能你的predictors之间有collinearity.
也许说的不对，大家多指点

c********h
发帖数: 330

来自主题: Statistics版 - logistics reg 怎么看varibale 的correlation

vif是看covariates之间的关系的
如果关心y和单个x的关系，可以画boxplot，也可以分别画出x中属于class1的obs，和x
属于class2的obs，看是不是线性关系，比如属于class1的x值大多数都小于属于class2
的x值
correlation也不会一点用都没有，想麻烦点的话可以用permutation test
可以做univariate logistic regression，看是不是显著等等
现成的command不是很清楚

c****0
发帖数: 14490

来自主题: Statistics版 - 问2个统计检验的问题

proc univariate data=data_name normal; var var_name;run;
it will provide Tests for Normality
Kolmogorov-Smirnov D
Cramer-von Mises W-Sq
Anderson-Darling A-Sq
it may work...

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天