l*********s 发帖数: 5409 | 1 1. sort then match join within data step vs. SQL inner join
2. sorting whole big datasets vs. splitting it into smaller subsets,
sorting individually then rejoin? |
L***4 发帖数: 13 | 2 It depends on many factors - the size of base table/lookup table(s), your
CPU/IO/Memory capability, and the programming needs. Typically if the size
of larger dataset is below 1G, whatever methods are fine - I choose the
method easy to write and read. Otherwise, I will try to avoid sorting the
large data (if you have to sort, split sort or hash sort will use less
resource), for example, if base table is huge and there are several small
lookup tables, I like SET KEY. if you have two huge data and
【在 l*********s 的大作中提到】 : 1. sort then match join within data step vs. SQL inner join : 2. sorting whole big datasets vs. splitting it into smaller subsets, : sorting individually then rejoin?
|
l*********s 发帖数: 5409 | 3 Thank you very much ^__^
【在 L***4 的大作中提到】 : It depends on many factors - the size of base table/lookup table(s), your : CPU/IO/Memory capability, and the programming needs. Typically if the size : of larger dataset is below 1G, whatever methods are fine - I choose the : method easy to write and read. Otherwise, I will try to avoid sorting the : large data (if you have to sort, split sort or hash sort will use less : resource), for example, if base table is huge and there are several small : lookup tables, I like SET KEY. if you have two huge data and
|
p********a 发帖数: 5352 | 4 我平时基本上用PROC SQL,灵活好用啊。另外,那些老家伙们不懂SQL,让那些家伙看
不懂好了。
不过很大很大的DATA,还是MERGE快点。我做过很多试验,SORT/MERGE确实比SQL快点。
或者用PROC FORMAT/INDEX之类的方法提高效率。 |
b*******r 发帖数: 152 | 5 there was a heated debate on sql vs merge on this board not long time ago.
search and get some sense out of it. |
A*******s 发帖数: 3942 | 6 for 2.
proc sort has threaded option from version 9. you dont have to do it on your
own.
【在 l*********s 的大作中提到】 : 1. sort then match join within data step vs. SQL inner join : 2. sorting whole big datasets vs. splitting it into smaller subsets, : sorting individually then rejoin?
|
l*********s 发帖数: 5409 | 7 Good to learn, thank you.
your
【在 A*******s 的大作中提到】 : for 2. : proc sort has threaded option from version 9. you dont have to do it on your : own.
|
h*e 发帖数: 10233 | 8 我的经验是大的dataset proc sort merge快一些.我的数据动不动就是几百万个record
,应该算是比较大的。 |