c**********e 发帖数: 2007 | 1 The following will create two CDFs in two graphs.
How to put them in one graph?
proc capability data=data1;
var var1 var2;
cdfplot;
run; |
|
s*****n 发帖数: 2174 | 2 也未必就很简单, 有的时候这类事情一层窗户纸.
像上次有人问怎么循环生成 data1, ..., data9 那种.
如果不是提示一下, 自己去发掘, 恐怕得花些时间了. |
|
f*********8 发帖数: 165 | 3 我有两组数据:
data1有两个variable:location, Pvalue (900 obs)
data2有两个variable: start, end (10 observations)
我想比较location是否在start和end之间,如果是的话data就merge到一起,如果多个
location在同一个start-end区间,这些location的Pvalue就用来算Mean Pvalue.其他
不在任何start-end区间的location就删除。
请问这个应该怎末coding?多谢了。 |
|
l*******l 发帖数: 204 | 4 proc sql;
create table xxx as
select * from data1 as a
join data2 as b
on a.start<=b.location<=a.end;
quit; |
|
p**********r 发帖数: 10 | 5 I have the following dataset data1:
var num1 num2 count
a 1 11 2
a 1 11 2
b 2 21 1
c 3 45 3
c 3 45 3
c 3 45 3
count实际上就是# of ob in group(var num1 num2)
我的问题时,我想要挑出count最大的那个组,记录下num1,num2,然后用这两个 |
|
g*******y 发帖数: 380 | 6 Then trying this:
proc sql;
create table new as
select a.*
from data2 as a
right join (select distinct num1, num2
from (select * , max(count) as max_count
from data1
having count=calculated max_count)) as b
on a.num1=b.num1 and a.num2=b.num2
;
quit; |
|
m*********n 发帖数: 413 | 7 my way to do this, for numeric variables having a format:
proc sql;
select sex, sex as sex1 format=10.4
from data1
;
quit;
then you can get a list of formatted value in sex and original values in
sex1.
of course the best way to check a format is to use PROC FORMAT |
|
t**g 发帖数: 1164 | 8 有3个datasets (data1,data2,data3)
都包含一样的变量(loan number, credit score, balance)
要写一个macro
把这个3个datasets合并成一个dataset
并且print out average of balance by credit score
请问如何写代码呢?
谢谢 |
|
d*******1 发帖数: 854 | 9 proc sql noprint;
create table data3 as
select a.id, a.var_1,b.var_2 from data1 a, data2 b
where a.id=b.id order by a.id,a.var_1,b.var_2;
quit; |
|
m********a 发帖数: 194 | 10 一个数据文件, 里面10000个observations,
想要把它分成100个小文件,每个里面含100个observations,
observation 1 to 100 go to data1.txt
observation 101 to 200 go to data2.txt
...
observation 9901 to 10000 go to data100.txt
也就是顺序把每100个obs放到一个新的.txt (or .dat) file 里面
请教一下,想写一个macro or something else?
多谢 |
|
m********a 发帖数: 194 | 11 谢谢
试了一下,就是分配到不同文件的时候不知道怎么搞,因为这100个小文件要起不同的
名字, data1.dat 一直到 data100.dat |
|
y****n 发帖数: 46 | 12 data temp;
do i=1 to 10000;
id=i;
xx=i/i**2;
output;
end;
drop i ;
run;
%macro out;
data %do x=1 %to 100; data&x %end;;
set temp ;
%do j=1 %to 99;
%if &j=1 %then %do;
if 1<=_n_<=100 then output data1;
%end;
%let k=%eval(&j+1);
if %eval(&j*100+1)<=_n_<=(%eval(&j+1)*100) then output data&k;
%end;
run;
%mend out;
%out |
|
m********a 发帖数: 194 | 13 谢谢楼上啊!
好像只是生成了100个SAS系统数据, 还是没解决如何创建data1.dat, data2.dat....
data100.dat的问题。。。 |
|
y****n 发帖数: 46 | 14 data temp;
do i=1 to 10000;
id=i;
xx=i/i**2;
output;
end;
drop i ;
run;
filename out "e:\temp";
%macro out;
data _null_;
set temp ;
%do j=1 %to 99;
%if &j=1 %then %do;
if 1<=_n_<=100 then do;
file out(data1.txt);
put id xx;
end;
%end;
%let k=%eval(&j+1);
if %eval(&j*100+1)<=_n_<=(%eval(&j+1)*100) then do;
file out(data&k..txt);
put id xx;
end;
%end;
run;
%mend out;
%out |
|
y****n 发帖数: 46 | 15 data temp;
do i=1 to 10000;
id=i;
xx=i/i**2;
output;
end;
drop i ;
run;
filename out "e:\temp";
%macro out;
data _null_;
set temp ;
%do j=1 %to 99;
%if &j=1 %then %do;
if 1<=_n_<=100 then do;
file out(data1.txt);
put id xx;
end;
%end;
%let k=%eval(&j+1);
if %eval(&j*100+1)<=_n_<=(%eval(&j+1)*100) then do;
file out(data&k..txt);
put id xx;
end;
%end;
run;
%mend out;
%out |
|
x***n 发帖数: 103 | 16 data **;
format Building $36.;
merge data1 data2; |
|
p********a 发帖数: 5352 | 17 ☆─────────────────────────────────────☆
missshinla (missshinla) 于 (Thu Mar 18 03:40:55 2010, 美东) 提到:
一个数据文件, 里面10000个observations,
想要把它分成100个小文件,每个里面含100个observations,
observation 1 to 100 go to data1.txt
observation 101 to 200 go to data2.txt
...
observation 9901 to 10000 go to data100.txt
也就是顺序把每100个obs放到一个新的.txt (or .dat) file 里面
请教一下,想写一个macro or something else?
多谢
☆─────────────────────────────────────☆
tosi (夏虫语冰) 于 (Thu Mar 18 09:49:24 2010, 美东) 提到:
In Data Step, use Do Loop and |
|
S*x 发帖数: 705 | 18 select *
from data1 as a
left join data2 as b on (a.var1=b.var1)
where b.var1 is null
VAR2, |
|
s********l 发帖数: 245 | 19 I meet a problem during the work.
I have 100 data sets need to read into SAS, such as:
data1
1 2 3 4
data2
2 3 4 8
......
data100
2 4 8 9
I want to read these 100 data to SAS and merge that to one data set,what
kind of macro I should write? Many thanks. |
|
x**m 发帖数: 941 | 20 I think your code failed to create the transit dataset, which is the
combination from data1 to data100 every loop.
BTW, where is the 'end;" ? Do I misunderstand sth here? |
|
S********a 发帖数: 359 | 21 data datain;
merge data1 (in=aa)
data2 (in=bb);
by x;
run;
括号里的in起什么作用? 双黄包答谢! |
|
C******y 发帖数: 2007 | 22 PROC IMPORT OUT= data1
DATATABLE= "xxx"
DBMS=ACCESS2000 REPLACE;
DATABASE="c:\1\2.mdb";
RUN;
The code may vary depending on version of access and sas
出错 |
|
s***r 发帖数: 1121 | 23 How can I merge 3 datasets using PROC SQL?
Dataset 1:
Plant date1 Variable1
Unique YYYYMMDD
ID
001 20060914 .....
001 20080801 .....
001 20080822
001 20100101
002 20011119
002 20020101
002 20030808
003 20091212
005 20000816
005 20001225
005 20010205
005 20030203
005 20030501 ....
...
...
....
Dataset 2:
Plant date2 Variable2
Unique YYYYMMDD
ID
001 20050314 ... 阅读全帖 |
|
e****t 发帖数: 766 | 24 a lazy way to Q1: you can try to see if it works.
for data2, you have each X vector as a record, set all the y values to be .
, set to data1, fit the model, since all y in data2 is missing, it wont
participate the estimation, but when you specify 'pred', the missing y will
be predicted.
for Q2, if it were me, i will either delete X or only select X with CA,OR,
WA from data2 to calculate. |
|
o****o 发帖数: 8077 | 25 for Q2, you should consult with business and data side to see why those 2
states appear in your data2 but no in data1. Are there any business rule
changes, market strategy changes, or other changes from business /
organization due to all kinds of reasons.
Without making sure all these business/data issues, hence data generating
process, being settled, it is inappropriate to even talk about building a
model, just wrong. |
|
c*****1 发帖数: 131 | 26 I think oloolo give the best answer for Q2.
However, in actual business, this can happen from time to time. We can not consult back with business and data side always. There should be a way to handle this kind of situation.
Add my 2 cents:
Check the distribution of variable x in data1 and data2. If frequency of x in ('TX' and 'NV') are small, and there are other predictors in the model besides x, you can still fit the model if you create indicators like this (it will take 'TX' and 'NV' into acco... 阅读全帖 |
|
k*******a 发帖数: 772 | 27 也不是,有可能会是12
比如 data1里面有12个obs, 都是1
data2里面有5个obs,也都是1
按照里面那个merge的话,结果就是12个1
于5 |
|
m********9 发帖数: 1985 | 28 percentile 是某个位置上的你的数据序列里的那个数字,不是你平均就可以得到的
10% percentile DATA 1
N1*0。1+0。5,取整之后,那个位置的数
10% percentile DATA 2
N2*0。1+0。5,取整之后,那个位置的数
10% percentile DATA1+DATA2
(N1+N2)*0。1+0。5=N1*0。1+0。5+0。2*0。1+0。5-0。5
你只知道那个数,不知道位置,如何计算?
还有我手算过很多PERCENTILE, EXCLE 里面的FUNCTION 是错误的。 |
|
b**********i 发帖数: 1059 | 29 ..... 知道了一定要告诉你吗。。。。。
match files file = "directory\data1.sav"
/rename(id=id2)
/table= "directory\data2.sav"
/in= from_data2
/by=id2.
select if (from_data2 NE 1).
execute. |
|
L*****k 发帖数: 327 | 30 this is a transfer learning problem, hehe~~many ways to do it
en, the most straightforward way is, do unsupervised learning for data1+
data2(explanatory data only) together, like dimension reduction |
|
p***r 发帖数: 920 | 31 or you can do it in another brutal way
*WIDE TO LONG;
PROC TRANSPOSE DATA=data1 OUT=data2;
BY var_id;
VAR _ALL_;
RUN;
data data3;
set data2;
log_var=log(col1);
run;
*LONG TO WIDE;
PROC TRANSPOSE DATA=data3 OUT=data4
BY var_ID;
ID variable;
VAR log_var;
RUN; |
|
g*****0 发帖数: 14 | 32 I have R script my_script.R . It loads a large input file "data.txt" and
then outputs a large matrix "out"
Now I have 30 different input files "data1.txt", "data2.txt" ... and
"data30.txt" and want to generate and save 30 output (matrix) files
separately, how can I achieve this by running the R-script on a linux
cluster?
Looked at R CMD BATCH, but not sure whether and how it works...
Thanks! |
|
p***e 发帖数: 118 | 33 有DATA1
Num Product
1 Lipitor
1 Crestor
1 Zetia
1 Zocor
变成 DATA2
Num Product
1 Lipitor
1 Crestor
2 Lipitor
2 Zetia
3 Lipitor
3 Zocor
4 Crestor
4 Zetia
5 Crestor
5 Zocor
基本上把col2的产品名字排列组合一下,请问这里的高人有没有什么简便方法?
谢谢谢谢 |
|
l*******a 发帖数: 107 | 34 首先感谢这个版面给我了很大的信心和鼓励,感谢帮助过我的所有人!!!
现在我有个头疼的问题,一直很费解,拿到版上来请教下大家。我有两个data sets,
想得到这样的output结果:
this is the first sas data
FundingAmount 0-15000 0.217
ObjectAmount 15000-30000 0.318
FundingAmount 30000-60000 0.519
quit;
Next is the second sas data
Ashley Liu 17
brandon green 30
susan Chen 28
this is the end of the sas code
数据如下:
data data1;
input variableName $14. variableLength : $12. price;
datalines;
FundingAmount 0-15000 0.217
ObjectAmount 15000-... 阅读全帖 |
|
k*******a 发帖数: 772 | 35 ID.keep <- names(which(tapply(data$V2, data$ID, function(x) all(x>=0 & x<=5)
)))
data1 <- data[data$ID %in% ID.keep, ] |
|
V****e 发帖数: 858 | 36 在工作中。
怎么把第一个file里的student ID按照第二个file里的ID挑选出来。Manually太慢了,
有400多学生。大牛们请一定帮助!
Data1是这样的
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
7838 Lawal
10003 Dickens
10003 Dickens
49657 Syas
49657 Syas
49657 Syas
49657 Syas
49657 Syas
Data2
270 7838 Male
314 10003 Female
479 57036 Male ... 阅读全帖 |
|
c***z 发帖数: 6348 | 37 merge(data1, data2, by = "ID") |
|
V****e 发帖数: 858 | 38 Data1 和 data 2在不同的file里也行吗?
谢谢楼上各位。我这是excel file. |
|
v*******e 发帖数: 11604 | 39 R里面可以用list
x=array(list(), dim=c(10,10,5))
x[1,1,1]=data1
x[1,1,2]=data2
... |
|
w*******9 发帖数: 1433 | 40 假设两个表都sort过了,而且table1里每个row都能唯一对应到table2中的某个row. 如
果不是这种情况可以在do loop里改改。
data match ( drop = continue );
retain matchid 1;
set data1;
continue = 1;
do while (continue);
set table2 point = matchid;
if x > = x2 then do;
if matchid < 2000000 then matchid + 1;
else continue = 0;
end;
else if x >= x1 and x < x2 then do;
output;
continue = 0;
end;
end;
run; |
|
n*****5 发帖数: 984 | 41 我有一个 10 * 100K的矩阵, 10 是10个feature, 100K 是数据点,每个点会包括一
些或者全部feature。
F1, F2, ... F10
数据就是 data1 1, 0, 0, 1 ...
data2 1, 0, 1, 1 ...
现在我想求出这10个feature 之间的关系,比如说把10个feature,每个feature 对应
一个(x,y) 然后把10个feature 画出来。
请问应该用什么方法?
十分感谢 |
|
|
n*****3 发帖数: 1584 | 43 mget the list of dataframe by regular expression
then Reduce this list with rbind
,7 |
|
H*H 发帖数: 472 | 44 Reduce跟do.call都能实现,但是要是你的数据都比较大,这两个都比较慢
dplyr包的rbind_all是 do.call(rbind, list)的c++版本,
dplyr::rbind_all在数据量比较大时可以轻松提速十倍 |
|
w*******9 发帖数: 1433 | 45 most intuitive way (similar to kirklanda's solution):
a <- paste("data",1:1000,sep="",collapse=",")
eval(parse(text=paste(c("rbind(",a,")"),sep=""))) |
|
|
t*********u 发帖数: 26311 | 47 data3 = FOREACH data2 GENERATE group, SUM(data1.cpc) as cost:int;
data4 = ORDER data3 BY cost ASC;
如果
DESCRIBE data4;
不会报错
但是换成 DUMP data4;
就报错
2014-06-11 17:22:26,525 ERROR org.apache.pig.tools.pigstats.SimplePigStats:
ERROR: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.L
ong cannot be cast to java.lang.Integer
2014-06-11 17:22:26,525 ERROR org.apache.pig.tools.pigstats.PigStatsUtil: 1
map reduce job(s) failed!
2014-06-11 17:22:26,573 ERROR org.apache.pig.tools.grunt.Gru... 阅读全帖 |
|
n*****5 发帖数: 984 | 48 我有一个 10 * 100K的矩阵, 10 是10个feature, 100K 是数据点,每个点会包括一
些或者全部feature。
F1, F2, ... FN
数据就是 data1 1, 0, 0, 1 ...
data2 1, 0, 1, 1 ...
现在我想求出这10个feature 之间的关系,比如说把10个feature,每个feature 对应
一个(x,y) 然后把10个feature 画出来。
请问应该用什么方法?
十分感谢 |
|
o**o 发帖数: 58 | 49 rbindlist
或者do.call(rbind, list()) |
|
n*****3 发帖数: 1584 | 50 if there are patterns in the names, or you want to rbind all the datasets in
the namespace,
you can get/mget them to a list,
then Reduce(..., rbind( ), ).. |
|