y****2 发帖数: 34 | 1 data <- matrix(c(1 ,1 ,2, 2, 1, 3, 4, 2,1, 5, 6, 3,2, 7, 8, 3,2, 9, 10, 4),
ncol=4, byrow=T)
colnames(data) <- c("id", "x1", "x2", "e")
### step1:
data[,2:3] <- data[,2:3]*data[,4]
data1 <- data[,1:3]
### step2:
data2 <- aggregate(data1[,2:3], list(id=data1[,1]), sum)
### step3:
data3 <- split(data2[,2:3], f=list(data2[,1]))
data3 <- lapply(data3, as.vector, "numeric")
mprod <- function(x){x %*% t(x)}
data4 <- lapply(data3, mprod)
### step4:
data5 <- 0
for(i in 1:length(data4)){
data5 <- data5 + |
|
s*****n 发帖数: 2174 | 2 1. names(data)[1] <- "newname" 就可以, 如果你不喜欢用数字index, 也可以这样
names(data)[names(data)=="var1"] <- "newname" 或者
names(data) <- gsub("var1", "newname", names(data)) 都可以
2. 你说那个有个条件, 就是BY variable必须是相同的. 考虑如果data1, data2,
data3之间做一个merge. data1和data2之间用var1和var2来做index match, 而data1和
data3之间用var3来做index match. 反正就是这种比较复杂的merge, 每个data之间的
BY variable都不确定. 很难定义一个函数来handle多个data, 除非这个函数本身提供
很多很多参数.
3. 除了SAS, 还有别的语言有你说的这种"最近的data"的概念吗?
是最近一个赋值(写)的, 还是最后一个取值(读)的? 比如
data3 <- merge(data1, data2)
print(data2 |
|
t**********r 发帖数: 182 | 3 Many thanks for your hint!! I made it. Here is the code.
579 proc sql;
580 create table data3 as
581 select data1.*, data2.rating, day1-day2 as diff
582 from data1, data2
583 where data1.var1=data2.var1 and data1.var2=data2.var2
584 and date1-date2>0
585 group by data1.var1, data1.var2, data1.date1
586 having diff=min(diff);
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.data3 created, with 48144 rows and 9 |
|
|
m**********2 发帖数: 2252 | 5 有这样一个table,想直接select出不同ID的data1,data2。。。的sum,该怎莫做?
原来的:
ID Data1 Data2 Data3
1 3341 1926 336
1 972 562 130
1 1580 760 213
1 742 345 87
1 7776 4237 889
2 27129 15347 3002
2 1428 825 139
2 3211 1918 361
2 1114 598 114
2 1219 602 110
3 1238 684 127
select 之后
ID Data1 Data2 Data3
1 14411 7830 1655
2 34101 19290 3726
3 1238 684 127
谢!!! |
|
t**********r 发帖数: 182 | 6 Has figured it out. Thanks.
===============
579 proc sql;
580 create table data3 as
581 select data1.*, data2.rating, day1-day2 as diff
582 from data1, data2
583 where data1.var1=data2.var1 and data1.var2=data2.var2
584 and date1-date2>0
585 group by data1.var1, data1.var2, data1.date1
586 having diff=min(diff);
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.data3 created, with 48144 rows and 9 columns.
587 quit;
NOTE: PROCEDURE SQL used (Tota |
|
p***r 发帖数: 920 | 7 or you can do it in another brutal way
*WIDE TO LONG;
PROC TRANSPOSE DATA=data1 OUT=data2;
BY var_id;
VAR _ALL_;
RUN;
data data3;
set data2;
log_var=log(col1);
run;
*LONG TO WIDE;
PROC TRANSPOSE DATA=data3 OUT=data4
BY var_ID;
ID variable;
VAR log_var;
RUN; |
|
f******u 发帖数: 250 | 8 you do not need a macro.
data data1(keep=x);
input variableName $14. variableLength : $12. price;
x=catx(' ', variableName,variableLength,price);
datalines;
FundingAmount 0-15000 0.217
ObjectAmount 15000-30000 0.318
FundingAmount 30000-60000 0.519
;
run;
data data2(keep=x);
input firstname : $7.lastname : $7. age;
x=catx(' ',firstname,lastname,age);
datalines;
Ashley Liu 17
brandon green 30
susan Chen 28
run;
data data3;
set data1 data2;
run;
data _null_;
fil... 阅读全帖 |
|
t*********u 发帖数: 26311 | 9 data3 = FOREACH data2 GENERATE group, SUM(data1.cpc) as cost:int;
data4 = ORDER data3 BY cost ASC;
如果
DESCRIBE data4;
不会报错
但是换成 DUMP data4;
就报错
2014-06-11 17:22:26,525 ERROR org.apache.pig.tools.pigstats.SimplePigStats:
ERROR: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.L
ong cannot be cast to java.lang.Integer
2014-06-11 17:22:26,525 ERROR org.apache.pig.tools.pigstats.PigStatsUtil: 1
map reduce job(s) failed!
2014-06-11 17:22:26,573 ERROR org.apache.pig.tools.grunt.Gru... 阅读全帖 |
|
t**********r 发帖数: 182 | 10 Want to merge two data sets using proc sql:
Data1:
var1 var2 date1
Data2:
var1 var2 date2 rating
(Note: var1 and var2 are the same variables in these two data sets)
Question:
Select rating in data2 to data1; meeting the following criteria:
1. date1 - date2 >0
2. date1 - date2 has the minimum value.
I wrote the following code; but it won't work:
proc sql;
create table data3 as
select data1.*, data2.rating, date1-date1 as diff
from data1, data2
where data1.var1=data2.var1 and data1.var2=data2.var2 |
|
m********5 发帖数: 619 | 11 你这个条件1和条件2我看着怎么重叠啊....
proc sql; create table data3 as select a.*, b.* from data1 as a, data2 as b
where a.var1=b.var1 and a.var2=b.var2 and a.date1-b.date2>=minvalue and a.
date1-b.date2>0;
quit; |
|
|
c*****d 发帖数: 6045 | 13 有包子,上
select id, sum(data1), sum(data2), sum(data3)
from table
group by id |
|
z***e 发帖数: 5393 | 14 就是要在production上设一个debug flag,然后把所有细节都能log下来。
问题是这个所有细节就有很多,而且很多步。
本来简单的做法就是
void debug(boolean debugFlag, String message) {
if (debugFlag) {
log(message);
}
}
那么caller就会有很多种这种:
debug(flag, "Data1: "+ ...+....+...);
debug(flag, "Data2: "+ ...+....+...);
debug(flag, "Data3: "+ ...+....+...);
但是如果没有设置debugflag,我不知道"Data1: "+ ...+....+...这种string concat
会不会执行。如果是C/C++的话,这种直接inline就好了,但是java又不能inline。
我把它改成:
void debug(boolean debugFlag, String... message) {
...
}
然后在里面再去concat string,这样是不是好点? |
|
w****g 发帖数: 597 | 15 附上问题屏幕贴图 :
1,在右上角的窗口是install“step 4 of 7 ”的屏幕;可以看到的选项都是/sda内置盘的分区,而没有/sdb 外置USB盘的任何分区. 如果要在外置USB盘/sdb在安装,哪里可以选择外置USB盘/sdb?
2,在左下角的terminal是disk information 'fdisk -l', 和桌面背景上的外置USB盘的4个分区sdb1,sdb3,sdb4,sdb5. 这里4个分区都已经NTFS格式化,其卷标名字分别是"Charlie","DATA1","DATA2","DATA3". 在分区/dev/sdb1,2,3,4,5之中,哪一个是安装的目标盘"DATA2"?
谢谢. |
|
N****w 发帖数: 21578 | 16 boot up with the live CD, plugin your USB HD
run a file manager program, the 4 partitions will be loaded automatically
and you will see the "name" ...
置盘的分区,而没有/sdb 外置USB盘的任何分区. 如果要在外置USB盘/sdb在安装,哪里
可以选择外置USB盘/sdb?
盘的4个分区sdb1,sdb3,sdb4,sdb5. 这里4个分区都已经NTFS格式化,其卷标名字分别是
"Charlie","DATA1","DATA2","DATA3". 在分区/dev/sdb1,2,3,4,5之中,哪一个是安
装的目标盘"DATA2"? |
|
R****a 发帖数: 199 | 17 Can you take something like the following:
yourownexecutable data1 data2 data3 ....
I guess that is not a good solution for you if you have too many data items.
But this does satisfy your request. |
|
|
|
|
q**j 发帖数: 10612 | 21 请问R里面有没有2维以上的matrix?
y=c(1,2,3,4,5,6,7,8,9,10,11,12)
x = matrix(y, c(2,2,3))
> dim(x)
[1] 2 6
是否这个就意味着No?是否这是就必须要用到list?
另外:如果我有Data1, Data2, Data3 ... Data10
如果在我想要 i in 1:10 时
Temp = Datai, 请问这个应该如何写?用matlab是可以先拼凑一个字符串,然后eval这
个字符串。但是R里面的eval好像不能这样搞。请指教。
(这个可以通过一个list来实现。但是我想知道一个比较general的方法。) |
|
s*****n 发帖数: 2174 | 22 你之所以有很多问题, 是因为你的SAS思想太重. 在那个构架下考虑R本身, 你自然会
有很多问题. 你不是用过Matlab吗? 下次试着用Matlab的思想来考虑R, 你就没这么多
感想了.
问题1: 如果你仅仅是想要改data frame里面的column name, 那就用names()好了. 这
是最常用的东西之一了. 反而是我把你的问题想的复杂了. 我说的改object本身的名字.
问题3: 请问SQL和SAS里面如何merge三个data, 比如data1,data2, data3. n个data的
merge, 会有2^n种组合, 如何提供这种flexibility?
问题6: 之所以你有"最近一个data set"或者"当前data set"的概念, 就是因为SAS里面
有data step. 在R里面, 如何定义这个? |
|
t**********r 发帖数: 182 | 23 Want to merge two data sets using proc sql:
Data1:
var1 var2 date1
Data2:
var1 var2 date2 rating
(Note: var1 and var2 are the same variables in these two data sets)
Question:
Select rating in data2 to data1; meeting the following criteria:
1. date1 - date2 >0
2. date1 - date2 has the minimum value.
I wrote the following code; but it won't work:
proc sql;
create table data3 as
select data1.*, data2.rating, date1-date1 as diff
from data1, data2
where data1.var1=data2.var1 a |
|
y******0 发帖数: 401 | 24 proc sql;
create table data3 as
select data1.var1,data1.var2, data2.rating, min(date1-date2) as diff
from data1, data2
where data1.var1=data2.var1
and data1.var2=data2.var2
and date1>date2
group by 1,2,3;
quit; |
|
t**g 发帖数: 1164 | 25 有3个datasets (data1,data2,data3)
都包含一样的变量(loan number, credit score, balance)
要写一个macro
把这个3个datasets合并成一个dataset
并且print out average of balance by credit score
请问如何写代码呢?
谢谢 |
|
d*******1 发帖数: 854 | 26 proc sql noprint;
create table data3 as
select a.id, a.var_1,b.var_2 from data1 a, data2 b
where a.id=b.id order by a.id,a.var_1,b.var_2;
quit; |
|
s***r 发帖数: 1121 | 27 How can I merge 3 datasets using PROC SQL?
Dataset 1:
Plant date1 Variable1
Unique YYYYMMDD
ID
001 20060914 .....
001 20080801 .....
001 20080822
001 20100101
002 20011119
002 20020101
002 20030808
003 20091212
005 20000816
005 20001225
005 20010205
005 20030203
005 20030501 ....
...
...
....
Dataset 2:
Plant date2 Variable2
Unique YYYYMMDD
ID
001 20050314 ... 阅读全帖 |
|
d********t 发帖数: 837 | 28 Reduce(rbind, list(data1,data2,data3,data4))
example:
Reduce(rbind, list(data.frame(x1=c(1,2,3),x2=c(2,3,4)),data.frame(x1=c(5,6,7
),x2=c(7,5,4)),data.frame(x1=c(5,4,3),x2=c(7,6,5)))) |
|