t********m 发帖数: 939 | 1 【 以下文字转载自 DataSciences 讨论区 】
发信人: tulipdream (xiaohuaidan), 信区: DataSciences
标 题: 请教一个R问题:怎么rbind一系列data,如data1,data2,....data1000
发信站: BBS 未名空间站 (Thu Nov 13 17:35:36 2014, 美东)
有没有一个简单点的方法,类似于rbind(data1-data1000)的,求各位不吝赐教,谢谢
! |
|
k*******a 发帖数: 772 | 2 如果data1-data1000在一个list的话,就用do.call函数
否则可以试试
mycall = c("rbind", paste0("data", 1:1000))
mycall = lapply(mycall, as.name)
result = eval(as.call(mycall)) |
|
d********t 发帖数: 837 | 3 Reduce(rbind, list(data1,data2,data3,data4))
example:
Reduce(rbind, list(data.frame(x1=c(1,2,3),x2=c(2,3,4)),data.frame(x1=c(5,6,7
),x2=c(7,5,4)),data.frame(x1=c(5,4,3),x2=c(7,6,5)))) |
|
t********m 发帖数: 939 | 4 有没有一个简单点的方法,类似于rbind(data1-data1000)的,求各位不吝赐教,谢谢
! |
|
t**********r 发帖数: 182 | 5 Many thanks for your hint!! I made it. Here is the code.
579 proc sql;
580 create table data3 as
581 select data1.*, data2.rating, day1-day2 as diff
582 from data1, data2
583 where data1.var1=data2.var1 and data1.var2=data2.var2
584 and date1-date2>0
585 group by data1.var1, data1.var2, data1.date1
586 having diff=min(diff);
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.data3 created, with 48144 rows and 9 |
|
t**********r 发帖数: 182 | 6 Has figured it out. Thanks.
===============
579 proc sql;
580 create table data3 as
581 select data1.*, data2.rating, day1-day2 as diff
582 from data1, data2
583 where data1.var1=data2.var1 and data1.var2=data2.var2
584 and date1-date2>0
585 group by data1.var1, data1.var2, data1.date1
586 having diff=min(diff);
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.data3 created, with 48144 rows and 9 columns.
587 quit;
NOTE: PROCEDURE SQL used (Tota |
|
t**********r 发帖数: 182 | 7 Want to merge two data sets using proc sql:
Data1:
var1 var2 date1
Data2:
var1 var2 date2 rating
(Note: var1 and var2 are the same variables in these two data sets)
Question:
Select rating in data2 to data1; meeting the following criteria:
1. date1 - date2 >0
2. date1 - date2 has the minimum value.
I wrote the following code; but it won't work:
proc sql;
create table data3 as
select data1.*, data2.rating, date1-date1 as diff
from data1, data2
where data1.var1=data2.var1 and data1.var2=data2.var2 |
|
p******p 发帖数: 13 | 8 亲测可用,想覆盖data1的话把最后的new_data1改成data1就好,虽然覆盖源dataset习
惯很不好。
data data1;
input id t1 t2;
datalines;
1 2 3
2 4 5
3 4 5
;
run;
data data2;
input id;
datalines;
1
2
;
run;
proc sql noprint;
create table new_data1 as
select data1.*,coalesce(flag,0) as flag from
data1 left join (select data2.*,1 as flag from data2)
on data1.id=data2.id
;
quit; |
|
t**********r 发帖数: 182 | 9 Want to merge two data sets using proc sql:
Data1:
var1 var2 date1
Data2:
var1 var2 date2 rating
(Note: var1 and var2 are the same variables in these two data sets)
Question:
Select rating in data2 to data1; meeting the following criteria:
1. date1 - date2 >0
2. date1 - date2 has the minimum value.
I wrote the following code; but it won't work:
proc sql;
create table data3 as
select data1.*, data2.rating, date1-date1 as diff
from data1, data2
where data1.var1=data2.var1 a |
|
y******0 发帖数: 401 | 10 proc sql;
create table data3 as
select data1.var1,data1.var2, data2.rating, min(date1-date2) as diff
from data1, data2
where data1.var1=data2.var1
and data1.var2=data2.var2
and date1>date2
group by 1,2,3;
quit; |
|
s*****n 发帖数: 2174 | 11 1. names(data)[1] <- "newname" 就可以, 如果你不喜欢用数字index, 也可以这样
names(data)[names(data)=="var1"] <- "newname" 或者
names(data) <- gsub("var1", "newname", names(data)) 都可以
2. 你说那个有个条件, 就是BY variable必须是相同的. 考虑如果data1, data2,
data3之间做一个merge. data1和data2之间用var1和var2来做index match, 而data1和
data3之间用var3来做index match. 反正就是这种比较复杂的merge, 每个data之间的
BY variable都不确定. 很难定义一个函数来handle多个data, 除非这个函数本身提供
很多很多参数.
3. 除了SAS, 还有别的语言有你说的这种"最近的data"的概念吗?
是最近一个赋值(写)的, 还是最后一个取值(读)的? 比如
data3 <- merge(data1, data2)
print(data2 |
|
k*****u 发帖数: 1688 | 12 正解
昨天网上刚刚看了
union合并了,data1的变量比data2多,那么变量名都是data1的
要是data2的变量比data1多,那么前面的用data1变量名,后面用data2变量名 |
|
y*****w 发帖数: 1350 | 13 It seems survreg() in R and PROC LIFEREG in SAS run the same type of
survival analysis. However, when I ran both of them on a survival data, I
got different results. Both were set as exponential distribution, and have
right censored data. See below. Could anybody tell me why the results are
different? Did I miss specifying any important parameters in R? Thanks!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The R code:
survFit <- survreg(Surv(time, event, type="right") ~... 阅读全帖 |
|
s*****n 发帖数: 2174 | 14 R当初是为了run一些统计的东西设计的, 主要考虑的是方便, 所以很多object
handling都过于flexible以至于牺牲了performance. 举个最简单的例子, 就说data
frame吧, 其实就是个很慢的东西. 很多时候data manipulation的时候, 用matrix会快
的多.
data1 <- matrix(rep(0, 1e6), ncol = 1000, nrow = 1000)
data2 <- data.frame(data1)
object.size(data1)
object.size(data2) # almost same size
# 0.42秒
for (i in 1:1000){
for (j in 1:1000){
data1[i,j]
}
}
# 33.2秒
for (i in 1:1000){
for (j in 1:1000){
data2[i,j]
}
}
这几乎是100倍的速度差别, 就仅仅是data frame vs matrix. 两个例子里面都有双层
for loop. 其实fo... 阅读全帖 |
|
s*******s 发帖数: 1031 | 15 follow一下我的面经。
http://www.mitbbs.com/article_t/JobHunting/32517841.html
整理了我的几个解答的算法,分享一下。欢迎批评指正。
多谢!
1. 写一个程序,找出 5^1234566789893943的从底位开始的1000位数字。
我用的递归+数组大数乘法。
// Caclulate (m^n)%(10^k). Keep the k integer numbers in an array.
// Note: the integer numbers are in reversed in the array
// Assume: m>0, n>0, k>0
// Need to check validity outside of this function.
// call calculate(5, 1234566789893943, 1000) to get result.
// Time complexity: O((log n) * k * k)
// Space complexity: O((log n) * k)
ve... 阅读全帖 |
|
s*******s 发帖数: 1031 | 16 follow一下我的面经。
http://www.mitbbs.com/article_t/JobHunting/32517841.html
整理了我的几个解答的算法,分享一下。欢迎批评指正。
多谢!
1. 写一个程序,找出 5^1234566789893943的从底位开始的1000位数字。
我用的递归+数组大数乘法。
// Caclulate (m^n)%(10^k). Keep the k integer numbers in an array.
// Note: the integer numbers are in reversed in the array
// Assume: m>0, n>0, k>0
// Need to check validity outside of this function.
// call calculate(5, 1234566789893943, 1000) to get result.
// Time complexity: O((log n) * k * k)
// Space complexity: O((log n) * k)
ve... 阅读全帖 |
|
y***n 发帖数: 1594 | 17 原题在这里。 http://www.mitbbs.com/article_t/JobHunting/32517841.html 但是觉得不太对
搞这些题觉得很没思路。
// Caclulate (m^n)%(10^k). Keep the k integer numbers in an array.
// Note: the integer numbers are in reversed in the array
// Assume: m>0, n>0, k>0
// Need to check validity outside of this function.
// call calculate(5, 1234566789893943, 1000) to get result.
// Time complexity: O((log n) * k * k)
// Space complexity: O((log n) * k)
vector calculate(unsigned long m, unsigned long n, int k) {
if(... 阅读全帖 |
|
m**********2 发帖数: 2252 | 18 有这样一个table,想直接select出不同ID的data1,data2。。。的sum,该怎莫做?
原来的:
ID Data1 Data2 Data3
1 3341 1926 336
1 972 562 130
1 1580 760 213
1 742 345 87
1 7776 4237 889
2 27129 15347 3002
2 1428 825 139
2 3211 1918 361
2 1114 598 114
2 1219 602 110
3 1238 684 127
select 之后
ID Data1 Data2 Data3
1 14411 7830 1655
2 34101 19290 3726
3 1238 684 127
谢!!! |
|
o*****l 发帖数: 539 | 19 请教大侠们一个问题, 谢谢!
I create a linux EC2 instance, with /dev/sda, /dev/sdb, /dev/sdc.
How can I know /dev/sdb and /dev/sdc is mounted?
If not, how to mount it?
I tried create /mnt/data1, then do mount, fail.
$ sudo mount -t ext4 /dev/xvdb /mnt/data1
mount: wrong fs type, bad option, bad superblock on /dev/xvdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
[ec2-user@ip-172-31-3-14 data1]$ dmesg | tail
EXT... 阅读全帖 |
|
z****u 发帖数: 23 | 20 在matlab中如何运行不在current working directory中的script?
比方说现在的working directory是"C:\Program\Files\MATLAB\R2007b\work", 在\
work下面有一些文件夹,比方说有\work\data1,\work\data2等等,里面都有一些
script,比方是\work\data1\code1, \work\data2\code2。能在\work里写个script,
运行\work\data1\code1, \work\data2\code2吗?
多谢多谢! |
|
s******e 发帖数: 2181 | 21 谢谢解惑,我的问题是这样的,这是一个被我简化后的在matlab环境下运行的C程序。
我想测试我输入的数据是否被正确读入。一个途径是通过把输入参数赋给输出的地址,
来得到结果,看来是正确的。另一个是通过printf把输入数据直接打出来,可是这个打
出来的数字始终为0,无论采用printf("input=%dn", data2[0]); 还是printf("input=
%dn", *data2);都一样。真是见鬼了,那data2这个地址上存的数据到底是什么
#include "mex.h"
#include "gpu/mxGPUArray.h"
#include "cuda.h"
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, mxArray const *prhs[])
{
double *data1; *data2;
int m,n;
m=mxGetM(prhs[0]);
n=mxGetN(prhs[0]);
plhs[0]=mxCreateDoubl... 阅读全帖 |
|
c**********e 发帖数: 2007 | 22 Suppose that I have a data set data1, with numerical variables
x, y, z. I would like to do regression y=x if a macro
variable is "a" and do z=x y if the macro variable is not
"a". The following does not work well. How to do it?
%macro regre(var);
%if "&var."="a" %then %do;
proc reg data=data1;
model y=x;
run;
%end;
%else %do;
proc reg data=data1;
model z=x y;
run;
%end;
%mend;
%regre(a);
%regre(b); |
|
s*****n 发帖数: 2174 | 23 1. 用 is.element 很容易搞定.
data1[!is.element(paste(data1$key1, data1$key2), paste(data2$key1, data2$
key2)), ]
2. 直接tapply就好了, 根本用不着什么new啊, index的.
tapply(data$z, list(data$x, data$y), sum) |
|
y****2 发帖数: 34 | 24 data <- matrix(c(1 ,1 ,2, 2, 1, 3, 4, 2,1, 5, 6, 3,2, 7, 8, 3,2, 9, 10, 4),
ncol=4, byrow=T)
colnames(data) <- c("id", "x1", "x2", "e")
### step1:
data[,2:3] <- data[,2:3]*data[,4]
data1 <- data[,1:3]
### step2:
data2 <- aggregate(data1[,2:3], list(id=data1[,1]), sum)
### step3:
data3 <- split(data2[,2:3], f=list(data2[,1]))
data3 <- lapply(data3, as.vector, "numeric")
mprod <- function(x){x %*% t(x)}
data4 <- lapply(data3, mprod)
### step4:
data5 <- 0
for(i in 1:length(data4)){
data5 <- data5 + |
|
c*****t 发帖数: 1712 | 25 data x;
input x $ y $ z;
datalines;
a a 1
a a 2
b a 2
a c 1
;
run;
proc sort data=x; by x y;run;
proc means data=x noprint;
var z;
by x y;
output out=data1(drop=_type_ _freq_) sum=;
run;
proc sort data=data1; by y z;run;
data data2;set data1;by y; if last.y;run; |
|
z***9 发帖数: 1052 | 26 我有一个data1
ID Description
1 aadd
2 adsd
3 asdd
....
现在我又有了一个data2
ID Description
1 aaddd
2 adsdq
3 asddg
4 fdsfg
....
我希望用data2里的Description来替换data1里的Description,在ID相等的情况下.
我想到的笨办法就是做个left join,新建一个table.有没有什么fancy的方法直接
update data1 里的Description到data2的Description. |
|
t**c 发帖数: 539 | 27 Using MODIFY statement.
Let data1 be the master data set and data2 be the transaction-data-set:
DATA data1;
Modify data1 data2;
By ID;
RUN; |
|
t*****w 发帖数: 254 | 28 When I had my job interview, they always tested my SAS skill.However I use R
all the time. To help your preparation, read my R codes to see how much you
can understand it.
%in%
?keyword
a<-matrix(0,nrow=3,ncol=3,byrow=T)
a1 <- a1/(t(a1)%*%spooled%*%a1)^.5 #standadization in discrim
a1<- a>=2; a[a1]
abline(h = -1:5, v = -2:3, col = "lightgray", lty=3)
abline(h=0, v=0, col = "gray60")
abs(r2[i])>r0
aggregate(iris[,1:4], list(iris$Species), mean)
AND: &; OR: |; NOT: !
anova(lm(data1[,3]~data1[,1... 阅读全帖 |
|
|
c**y 发帖数: 419 | 30 MultiCharts可以引用2个股票的数据, 所以可以实现简单的2个股票的long/short交易.
如果第二个股票用大盘指数ETF或者行业ETF的话, 就变成了alpha/beta分离交易.
引用方法是建立指标公式 spread=close of data1 - beta * close of data2 ; 这样
这个spread就可以被图形化监控了.
由于有下面这个近似公式: (1+x)/(1+y)=1+(x-y)
x, y 分别是股票1,2的期间return%
所以一般beta取1的话,我们就直接用spread=close of data1 / close of data2. 这样
比较方便.
下图是AAPL(图1) 对RSP(等权重SP500指数ETF, 图2)的long/short对的比率(图3中红色
), 和2%的跟踪止损(图3中黄色). |
|
l*******c 发帖数: 523 | 31 如果有个t的函数:
V(t) = sqrt { (Va)^2 + [(Vb)^2 - (Va)^2]*[1-e^(kt/20)]/(1-e^T)};
Va是已知的值;
Vb也是已知的值;
k, T是常数。
然后k从1到20,有没有简单的算法可以减少for循环里运算量?以下是那段code:
if (Vb > Va)
deltaDAC = (Vb * Vb - Va * Va)/(1-exp(T));
else
deltaDAC = (Va * Va - Vb * Vb)/(1-exp(T));
if (Vb > Va)
{
for (k=1; k <= 20; k++)
{
Data1 = Va * Va;
Data2 = deltaDAC * (1-exp(k/20));
DAC_Set_Value = sqrt(Data1 + Data2);
DAC_Outp... 阅读全帖 |
|
z***e 发帖数: 5393 | 32 就是要在production上设一个debug flag,然后把所有细节都能log下来。
问题是这个所有细节就有很多,而且很多步。
本来简单的做法就是
void debug(boolean debugFlag, String message) {
if (debugFlag) {
log(message);
}
}
那么caller就会有很多种这种:
debug(flag, "Data1: "+ ...+....+...);
debug(flag, "Data2: "+ ...+....+...);
debug(flag, "Data3: "+ ...+....+...);
但是如果没有设置debugflag,我不知道"Data1: "+ ...+....+...这种string concat
会不会执行。如果是C/C++的话,这种直接inline就好了,但是java又不能inline。
我把它改成:
void debug(boolean debugFlag, String... message) {
...
}
然后在里面再去concat string,这样是不是好点? |
|
l*******c 发帖数: 523 | 33 如果有个t的函数:
V(t) = sqrt { (Va)^2 + [(Vb)^2 - (Va)^2]*[1-e^(kt/20)]/(1-e^T)};
Va是已知的值;
Vb也是已知的值;
k, T是常数。
然后k从1到20,有没有简单的算法可以减少for循环里运算量?以下是那段code:
if (Vb > Va)
deltaDAC = (Vb * Vb - Va * Va)/(1-exp(T));
else
deltaDAC = (Va * Va - Vb * Vb)/(1-exp(T));
if (Vb > Va)
{
for (k=1; k <= 20; k++)
{
Data1 = Va * Va;
Data2 = deltaDAC * (1-exp(k/20));
DAC_Set_Value = sqrt(Data1 + Data2);
DAC_Outp... 阅读全帖 |
|
V********n 发帖数: 3061 | 34 那你把下面三步合为一步,可以减少两次生成中间变量并寻址的动作,对减少时间有帮
助:
Data1 = Va * Va;
Data2 = deltaDAC * (1-exp(k/20));
DAC_Set_Value = sqrt(Data1 + Data2);
改成:
DAC_Set_Value = sqrt(Va * Va + deltaDAC * (1-exp(k/20)));
说实在的,现在的编程上很少会需要去考虑这么细微的区别。如果你真的对时间抠到纳
秒的地步,这么做也许会有点帮助。
ringing. |
|
l*******c 发帖数: 523 | 35 如果有个t的函数:
V(t) = sqrt { (Va)^2 + [(Vb)^2 - (Va)^2]*[1-e^(kt/20)]/(1-e^T)};
Va是已知的值;
Vb也是已知的值;
T是常数。
然后k从1到20,有没有简单的算法可以减少for循环里运算量?以下是那段code:
if (Vb > Va)
deltaDAC = (Vb * Vb - Va * Va)/(1-exp(T));
else
deltaDAC = (Va * Va - Vb * Vb)/(1-exp(T));
if (Vb > Va)
{
for (k=1; k <= 20; k++)
{
Data1 = Va * Va;
Data2 = deltaDAC * (1-exp(k/20));
DAC_Set_Value = sqrt(Data1 + Data2);
DAC_Output(... 阅读全帖 |
|
q**j 发帖数: 10612 | 36 1. 问题不是不能。而是不方便。names(data.frame)一次要全部改变。如果有20个变量
怎么办?一般人会觉得太麻烦吧?如果有 names(data.frame$var1) = "newname"。多
好。
2. SAS里面有in=option.可以解决很多问题。in=0,1。刚好2^n个。而且SAS比较谦虚。吧sql也用上了。这个
让用户很方便。R为什么不考虑兼容sql呢?
3. R完全可以:如果attach(data1),data1就是defualt。如果没有attach(),default attach
最近一个用过的。挺简单的一件事。
另外能不能问 tapply的时候能不能同时分析好几个column?
state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "ns |
|
g*******y 发帖数: 380 | 37 说实在的,不懂你在说什么?
第一,你所指的保留值是什么?是指你在data1里提取的count的值为最大的组里对应的
num1,num2吗?
第二,不懂你为什么要想那么多复杂的方法?因为从你的sample来看,你的每个组只是根
据count的值来进行重复,count是多少,每个组重复的观察值就是几个.你如果只是想要
count最大组里的num1,num2来剔取data2里的观察值的话.我觉得最简单的方法:
proc sort data=data1; by descending count;
用最大count组里的值,比如x,y来剔除你不要的观察值.
data temp; set data2; if num1=x and num2=y;
如果这个不是你想要的,那么也许我没有看懂你的问题. |
|
a****m 发帖数: 693 | 38 谢谢,主要是行数太多,前62个数(row wise)是一组,如果matrix 置换会把所有的
行弄混,结果txt转换成cvs时候,把空格给变成0了,用genelowvalfilter 好像也不能
去掉,只好这样处理了:
谢谢
[n,p]=size(data);
for i=1:2000
gene(i,:)=reshape(data(((i-1)*16+1):16*i,1:4),1,64);
end
data1=gene(:,1:47);
data2=gene(:,49:63);
data=[data1,data2];
这个filter 不起作用。
[mask,genes] = genelowvalfilter(gene,'absval',1); |
|
i**f 发帖数: 1195 | 39 不用看文字,看data就好了
把data1的格式变成data2,加上ADDL和II(additional dose的数量和dosing interval)
data1
D Date Time Event Glucose
101 01Jan2000 7:30 Sampling .
101 01Jan2000 8:00 Dosing .
101 01Jan2000 8:30 Sampling 100
101 01Jan2000 9:30 Sampling 200
101 01Jan2000 12:30 Sampling 170
101 01Jan2000 18:30 Sampling 140
101 02Jan2000 7:30 Sampling 90
101 02Jan2000 8:00 Dosing .
101 03Jan2000 8:00 Dosing .
101 04Jan2000 8:00 Dosing .
101 05Jan2000 8:00 Dosing .
data2 |
|
l******1 发帖数: 86 | 40 想用SAS解一个三次的方程,但是希望根的取值在0和1。这可怎么弄啊。
code如下。
data data1;
input p_ta;
datalines;
0.975
;
proc model data=DATA1;
eq.a=(1-ta)**3+3*ta*(1-ta)**5-p_ta;
solve ta /onepass solveprint;
run; |
|
s*****n 发帖数: 2174 | 41 其实就是一个循环, 循环里面包含一个判断. 实现的话在R里也就十几行.
data <- read.table(...)
result <- data.frame(try = 1:1000, output = NA, case = NA)
for (i in 1:1000){
data1 <- data[sample(100000, 10000), ]
data2 <- data[sample(100000, 10000), ]
if (mean(data1$var1) > 0){
fit1 <- lm(...)
result$output[i] <- functionA(data2, fit1$parameter_a)
result$case[i] <- "A"
} else {
fit2 <- glm(...)
result$output[i] <- functionB(data2, fit2$parameter_b)
result$case[i] <- "B"
}
}
hist(result$output[ |
|
w********5 发帖数: 72 | 42 This is my answer. My codes are alway very long and not efficient. Please
help simlify.
data data1;
input var1;
cards;
5
6
;
run;
data data2;
input var2;
cards;
5
6
;
run;
data new;
infile datalines dlm=" ";
input name $ var $ ;
datalines;
data1 var1
data2 var2
data2 var2
data4 var4
;
run;
proc sql;
select name into:name1-:name&SYSMAXLONG
from new;
select var into:col1-:col&&SYSMAXLONG
from new;
quit;
%put _user_;
option mprint mlogic;
%macro mutiple;
%do i=1 %to &sqlobs;
proc so |
|
s********l 发帖数: 245 | 43 The code I wrote is:
%macro data(num);
%do i=0 %to #
data est#
infile "path\data&num";
input a b c d;
run;
proc append base=data1 data=est#
run;
%mend;
%data(num=100);
through above program, I just got data combine with the data1 and the
data100. What's wrong with my program? I really need help from you! Many
thanks. |
|
s********p 发帖数: 637 | 44 She is pulling data from large tables, in most case, she will create new
dataset containing pulled variable for further analysis.
I am not clear if temporary will be created when no new dataset needed. I
don't use system option(fullstimer), but just tried the following,
proc sql;
select p1.id from
data1 p1
inner join
data2 p2 on
p1.id=p2.id
;
quit;
and check if temporary files generated and found "#tf0005.sas7butl" created and size changed.
ll /saswork/SAS_workDD3300003446
total 7000
-rw-rw-r-- ... 阅读全帖 |
|
c*****m 发帖数: 4817 | 45 if the width is fixed, then you can use read.fwf, for example
> data1 = read.fwf("M:/test.txt", widths=c(1,1,1,1,1))
> data1
V1 V2 V3 V4 V5
1 A B C D E
2 A B C D E
3 A B C D E
I |
|
k*****u 发帖数: 1688 | 46 1:sas用data1建模,建模得到的参数,怎么用到date2里做预测?我想到的土办法是把
参数记下来。然后在data step里面搞。实际应该怎么弄才好?
2:data1的变量x=CA,OR,WA,我把x弄成dummy variable,但是data2里面x除了上面3
个值,还有NV,TX。这样的话这个x还放到模型里面么?还是直接就去掉了?实际当中
一般怎么处理的? |
|
f******u 发帖数: 250 | 47 you do not need a macro.
data data1(keep=x);
input variableName $14. variableLength : $12. price;
x=catx(' ', variableName,variableLength,price);
datalines;
FundingAmount 0-15000 0.217
ObjectAmount 15000-30000 0.318
FundingAmount 30000-60000 0.519
;
run;
data data2(keep=x);
input firstname : $7.lastname : $7. age;
x=catx(' ',firstname,lastname,age);
datalines;
Ashley Liu 17
brandon green 30
susan Chen 28
run;
data data3;
set data1 data2;
run;
data _null_;
fil... 阅读全帖 |
|
A****1 发帖数: 33 | 48 If only a few outliers, you can use the annotate facility.
data my_labels;
retain xsys ysys '2' function 'label' position '1' style "'Arial/bo'"
color 'blue';
set data1;
if read>75 then do;
text='Value1'; x=write; y=read; output;
end;
run;
proc gplot data=data1;
plot read*write/annotate=my_labels;
run;
quit;
. |
|