用SAS合并两个上million的数据巨慢无比是为什么？ - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 用SAS合并两个上million的数据巨慢无比是为什么？

相关主题
● 请教用SAS的一个数据处理的问题
● DATA step能left join吗？
● Import excel file to sas (the first 8 or more observations
● SAS问题请教：Numeric Variable Length的设定
● SAS help needed!
● 100GB的SAS数据一般怎么处理？
● 分享一道最近碰到的很好的面试题。 (转载)
● which route in SAS is faster?
● 请教 2 道SAS Adv 真题
● 急！SAS adv的一道题，谢谢！

相关话题的讨论汇总
话题: beg话题: serie话题: end话题: have1话题: have2

进入Statistics版参与讨论

1

(共1页)

F****3 发帖数: 1504	1 非常简单的程序，run了一个下午都没run出来。 Both have1 and have2 have few variables, but many observations. proc sql; create table want as select * from have1 as a left have2 as b on b.serie_beg <= a.serie <= b.serie_end; quit; 把上面left join换成inner join也一样。。。 serie_beg, serie, and serie_end都是上billion的数值（numeric）。请怎么会这么久啊？
h***i 发帖数: 3844	2 sql比较土用data step 【在 F***3 的大作中提到】 : 非常简单的程序，run了一个下午都没run出来。 : Both have1 and have2 have few variables, but many observations. : proc sql; : create table want as : select : from have1 as a left have2 as b : on b.serie_beg <= a.serie <= b.serie_end; : quit; : 把上面left join换成inner join也一样。。。 : serie_beg, serie, and serie_end都是上billion的数值（numeric）。请怎么会这么
h******s 发帖数: 3420	3 A 和 b 没有 id 的吗？ On a.id=b.id and b.serie_beg<= .... 还差不多你这样胡连，many to many? 当然慢【在 F***3 的大作中提到】 : 非常简单的程序，run了一个下午都没run出来。 : Both have1 and have2 have few variables, but many observations. : proc sql; : create table want as : select : from have1 as a left have2 as b : on b.serie_beg <= a.serie <= b.serie_end; : quit; : 把上面left join换成inner join也一样。。。 : serie_beg, serie, and serie_end都是上billion的数值（numeric）。请怎么会这么
a****g 发帖数: 8131	4 data step is faster in this case 【在 F***3 的大作中提到】 : 非常简单的程序，run了一个下午都没run出来。 : Both have1 and have2 have few variables, but many observations. : proc sql; : create table want as : select : from have1 as a left have2 as b : on b.serie_beg <= a.serie <= b.serie_end; : quit; : 把上面left join换成inner join也一样。。。 : serie_beg, serie, and serie_end都是上billion的数值（numeric）。请怎么会这么
D*********Y 发帖数: 3382	5 为什么data step会快点呢【在 a****g 的大作中提到】 : data step is faster in this case
B******y 发帖数: 9065	6 语言的内部结构造成的，SAS自身的语言是最快的，SQL是移植的语言（是SAS老鸟说的，俺其实不懂，正确与否概不负责。。。）【在 D*********Y 的大作中提到】 : 为什么data step会快点呢
g**u 发帖数: 205	7 为什么啊？ data step不是还要先sort吗【在 a****g 的大作中提到】 : data step is faster in this case
F****3 发帖数: 1504	8 谢谢大家! 文件已经sort了但是不是按照series来，文件id其实就是series只是说数值必须在某一段之内。。。看来只能用series。是不是先要sort by series啊？我把have1 subset成1000个obs，很快就出来了。

l******m 发帖数: 111	9 不是很明白你这样join的基准是什么，是不是只要a的series在b的series之间呢？那样的话是不是先找出b.series的最大值和最小值，然后用data step写就可以了呢？没见到原始数据，也不知道这样说的对不对
s******r 发帖数: 1524	10 what you should try is not to find some new joint way. I would assume you run the code over PC and memory is limited. you condition is a.serie between b.serie_beg and b.serie_end for every record in a, you need to scan complete b, which is not necessary in most case. you can query the distribution of b.serie_beg and b.serie_end then try (where =(b.serie_beg between 1000 and 2000)) (where =(a.serie >=1000 )) .... If you break the join into 10 steps, the memory request would drop to 1% if the data are uniformly distributed. And if there are duplicate records, you also can keep only unique records to reduce the processing time. 【在 F***3 的大作中提到】 : 非常简单的程序，run了一个下午都没run出来。 : Both have1 and have2 have few variables, but many observations. : proc sql; : create table want as : select : from have1 as a left have2 as b : on b.serie_beg <= a.serie <= b.serie_end; : quit; : 把上面left join换成inner join也一样。。。 : serie_beg, serie, and serie_end都是上billion的数值（numeric）。请怎么会这么
F****3 发帖数: 1504	11 谢谢大家。我去研究一下。但是可以肯定的是table太大了. 有什么系统设置吧内存设置大一点吗？我是在linux下面搞不能用memlib
F****3 发帖数: 1504	12 不好意思问，请问sort sort merge用data step具体应该怎样merge？我不知道怎样把这个 b.serie_beg <= a.serie <= b.serie_end写到data step里面，以前都是用sql。谢谢！！！
l****u 发帖数: 529	13 data want; set a nobs=p; do i=1 to p; set b point=i; if serie_beg <=serie <=serie_end then output; end; run;

1

(共1页)

进入Statistics版参与讨论

相关主题
● 急！SAS adv的一道题，谢谢！
● 请教SAS ADV一道题！
● longitudinal vs repeated measures vs time series
● 新人报道，兼问SAS data set的问题
● SAS base question
● A VERY Tricky SAS question: Help Needed with Baozi
● SAS sampling的问题
● 求助，ｓａｓ题目
● 急求～SAS高手请进，一个rolling window类的问题，在线等，非
● 再问2道SAS advanced 题目，包子答谢。

相关话题的讨论汇总
话题: beg话题: serie话题: end话题: have1话题: have2