第9页 - 关于merge的讨论汇总 - 话题女王

全部话题 - 话题: merge

p********a
发帖数: 5352

来自主题: Statistics版 - 用SAS merge Excel cells ?

不用VBA的话，用SAS MERGE好了后再DDE

p*******r
发帖数: 1951

来自主题: Statistics版 - 用SAS merge Excel cells ?

sas里面怎么用vba命令? 要是不太复杂想试试看。用vba merge cells好像相对容易的
样子。

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

我有两个很大的datasets,1个1.3G,另一个11G，我正在用SAS来merge这两个datasets.
已经运行了2天多，形成的新dataset已经500多G。
有经验的给说说，还要多久呀？
有什么好的办法来处理这样大的dataset吗？
多谢！！

h***x
发帖数: 586

来自主题: Statistics版 - Dataset merge的一个问题

Should be something wrong, post your codes!
Basically, it needs about 20 minutes I/O time for 11G file. It won't take
more than 2 hours to merge these two files.
I guess there might be duplicates in both files. Try dedup first.

s*r
发帖数: 2757

来自主题: Statistics版 - Dataset merge的一个问题

可能是什么cartesian merge
都弄出500G来了

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

一个dataset是claims file,包括住院期间的、医生的，所以每个病人有几个claims。
另一个dataset是breast cancer file,这些人的乳腺癌确诊情况，包括确诊的日期，可
能每个人也有几行（不确定）。
好象是many to many merge.
我想看的是，病人在确诊癌症前1年的身体综合情况。
多谢大家的帮助！

o****o
发帖数: 8077

来自主题: Statistics版 - Dataset merge的一个问题

what is your merge criteria?
do you want all claims exactly within 1 year of breast cancer confirmation
for each patient?
what I don't quite understand is if there are multiple confirmation dates,
are u going to append claim data for each of them? that will be a huge final
data set.

c*******o
发帖数: 8869

来自主题: Statistics版 - Dataset merge的一个问题

但是 merge不会啊，sql才会产生这种情况

c*******o
发帖数: 8869

来自主题: Statistics版 - Dataset merge的一个问题

多对多 merge是极其不好的programming practice. clinical SAS programmer 做QC一
个重要的依据就是寻找在LOG里面有没有‘more than'。如有，必改之。

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

sort cancer by ID, sort claims by id and claim_date.
merge by ID.
claims里面有的人得了几个肿瘤，所以可能会有几个diagnosis date,我在下一步选出
是breast cancer 的diagnosis date.
多谢回复！！

final

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

我再想想，因为我没有仔细去考虑breast cancer dataset, 或许在merge前，我应该把
它的duplicates去掉.

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

来更新一下这个问题，其实，是one to many merge,不是many to many. 后来，我把大
的dataset的变量减少到我需要的几个变量，10几分钟就完了。原来有上千个变量。

one

g********e
发帖数: 8

来自主题: Statistics版 - SAS Data Merge 一问

try:
data New;
merge %do i=1 %to 100;
Studio&i
%end;
;
by Provider;
run;

D******6
发帖数: 6211

来自主题: Statistics版 - 请教一个用SAS作DATA MERGE的问题

好像SAS的merge就是多对多的。。。你自己试一下看看。

t*********e
发帖数: 313

来自主题: Statistics版 - How to prevent values being trucated when merging two files

I have a variable called building in two files. The longest value of
building or the name of a building in one file is 36 bytes but has no format
or informat. In the second file, building has format/informat $17. When
merging the two files, the length of values for building is truncated to 17
bytes. So I guess it's because of the format/informat $17..
Can any friend help me how to avoid such trucation?
Thanks much!-James

A*******s
发帖数: 3942

来自主题: Statistics版 - How to prevent values being trucated when merging two files

if in sas, try to use length statement before merge statement.

format
17

y****n
发帖数: 46

来自主题: Statistics版 - How to prevent values being trucated when merging two files

when you use merge and there are more common variables than those in the BY
statement you may run into trouble. Since SAS can only
have one value for each variable, there is the convention that only the
value in the last mentioned data set will be
saved.

x***n
发帖数: 103

来自主题: Statistics版 - How to prevent values being trucated when merging two files

data **;
format Building $36.;
merge data1 data2;

D******n
发帖数: 2836

来自主题: Statistics版 - [SAS] row merging

A I L R
a 0 1 2
a 0 3 4
a 1 5 6
a 1 2 3
a 0 4 5
a 0 5 9
b .....
b .....
b ....
...
||
\/
A I L R
a 0 1 4
a 1 5 6
a 1 2 3
a 0 4 9
b .....
b .....
b ....
...
basically within each A group(such as a,b,c,...), there are 0s and 1s for
the I attribute, i want to merge the each consecutive 0 subgroup(for example
the first 2 obs) into one entry keeping the L from first obs and R from
last obs.

a********6
发帖数: 14468

来自主题: Statistics版 - 请教如何在Excel里面Merge data sets？

在Excel里貌似叫Table。这种merge也适用于简单的database.我只是感觉很意外，应该
能做到一步到位的东西在Excel里居然不可以直接做。
感觉为了这么个小东西而转化文件格式到一个数据库软件或者SPSS等有点小题大做啊。
谁有经验，请说说。谢谢！

G*****u
发帖数: 1222

来自主题: Statistics版 - 请教如何在Excel里面Merge data sets？

你想merge什么样的data
你可以尝试用vlookup index match之类的function把数据整合到一块

p******r
发帖数: 1279

来自主题: Statistics版 - SAS 中match-merge两个data set的问题

比如我要把dataset A和data set B 合并（match-merge)，
按sas base tutorial里说，当tracking pointers分别所指的两个obs的各自by变量不
match的话，则在PDV（program data vector)里所写入的是writes the observation
that contains the value which comes first. 但我发现tutorial里的一个例子并不
是这样。（为了输入简单，我改动了一下具体的name等）
data set A data set B
Name EmmID Name Date
Anke 11 Anke 21
John 23 John 15
Alex 12 John 23
John 12
Tom 10
上面的这个例子，我想问的是，mer

S********a
发帖数: 359

来自主题: Statistics版 - 【包子】merge 语句里的(in= )

data datain;
merge data1 (in=aa)
data2 (in=bb);
by x;
run;
括号里的in起什么作用？双黄包答谢！

s**********l
发帖数: 395

来自主题: Statistics版 - 为什么合并这两个数据不能用sas merge 而要用SAS SQL

Policy data:
Policy# age Policy take effective date
1 24 01/01/2008
1 24 01/07/2008
2 50 06/15/2007
3 25 04/11/2006...
Claim data:
Policy# loss Claim date
1 1000 02/05/2008
2 2000 06/30/2007
3 3000 07/15/2007
4 4000 ...
given the above two data sets and we are asked to combine these two data
sets to find the age for each claim loss.(This requires that the claim date
is within 180 days since the policy take effective, in other words that the
claim date- date the policy take effective should be sm... 阅读全帖

R*********i
发帖数: 7643

来自主题: Statistics版 - 为什么合并这两个数据不能用sas merge 而要用SAS SQL

As there are more than one individual covered by a same policy number in
dataset 1 (Policy data), there might also be more than one claim under a
same policy number in dataset 2 (Claim data) and you want to have a multiple
-to-multiple merge which will be handled nicely by sql full join. Good luck!

w*********r
发帖数: 488

来自主题: Statistics版 - 谁能给讲讲Data Step里面set和merge的区别

比如
data aa;
set bb cc;
by var1;
run;
和
data aa;
merge bb cc;
by var1;
run;
我自己在SAS上做了一下简单的实验，两个结果是一样的。当然我自己编的data set也
很简单就是了。请问在什么样的情况下会产生不一样的结果呢？还是两个根本就是一回
事？

h******e
发帖数: 1791

来自主题: Statistics版 - a sas merge question

给每个dataset加个变量n，其值为_N_，然后在merge就行了。

k*******a
发帖数: 772

来自主题: Statistics版 - a sas merge question

Here is my solution, maybe too stupid:)
data a;
input planid level1 $;
datalines;
100 A
100 B
100 C
200 D
200 E
200 F
;
data b;
input planid level2 $;
datalines;
100 D
100 Z
200 H
;
data a;
set a;
by planid;
if first.planid then count=0;
count+1;
data b;
set b;
by planid;
if first.planid then count=0;
count+1;
data c;
merge a b;
by planid count;
drop count;
proc print data=c;run;

R*****g
发帖数: 99

来自主题: Statistics版 - a sas merge question

data ab;
merge a b; *don't add BY statement;
run;

R*********i
发帖数: 7643

来自主题: Statistics版 - 问个proc merge问题。

Came in b/c of your title - "Proc merge" was a famous interview trick where
the interviewer asks how to use the procedure so that s/he can identify if
the interviewee ever used SAS.

l*******s
发帖数: 437

来自主题: Statistics版 - 问个merge的问题

问题：
我要merge 两个table，一个有kids_id, parent_id 和login 的时间，另一个是短信
的信息，有发短信的时间，有短信的内容，发短信的人，kids_id等信息。
我要连接两个表，用两个表的kids_id, parent_id 以及，Login的时间在发短信的时间
的正负一分钟内（基于大多数情况的观察），结果三百多个孩子中只有一个孩子，他父
母半分钟内发了两条短信，结果在teradata 中join的时候就导致这个孩子本来只应该
有两条短信信息，显示出有四条。
我想问下有什么办法解决这个问题呢？
谢谢！

a****g
发帖数: 8131

来自主题: Statistics版 - 问个merge的问题

你说的是many-to-many的merge？
这种情况下sql给出的结果是a×b （a是在table1中的热忱哦人的事；b是在table2中的
records）
你可以试试inner join或者left/right join，视你的目的和data本身的情况决定

l****y
发帖数: 108

来自主题: Statistics版 - 关于merge data的问题！！！

上次面试被问到
什么情况下用ｓｑｌ　ｍｅｒｇｅ好
什么情况下用ｄａｔａ　ｓｔｅｐ好
我没有什么大型数据处理的经验，我觉得两个差不多，不知道哪个快
data merge 要先ｓｏｒｔ，ｓｑｌ不用ｓｏｒｔ
所以我就说大的用ｓｑｌ，小的用ｄａｔａ　ｍｅｒｇｅ
好像又不太对，求解答啊！！！

k*******a
发帖数: 772

来自主题: Statistics版 - 关于merge data的问题！！！

这个两个方法 many to many merge的时候结果是不同的

l****y
发帖数: 108

来自主题: Statistics版 - 关于merge data的问题！！！

merge是傻瓜
sql是单反？
具体上哪个更快呢，除了结果不一样，还有什么别的优劣比较呢

D******n
发帖数: 2836

来自主题: Statistics版 - [SAS] Merge and rename

Sometimes one needs to merge two files side by side and compare, but more
than often, they share the same names for most of the attributes. I can do
rename on one of the data sets, but it would be nice if there were just an
option or quick trick to rename all of them.
Is there anything like that?
tks.

s*********y
发帖数: 34

来自主题: Statistics版 - 比较dataset merge and proc sql 中的 outer join 的贴子

以前看到的比较dataset merge and proc sql 中的 outer join 的贴子,不知道是否有
人收藏? 能否再贴一遍.

d******9
发帖数: 404

来自主题: Statistics版 - 比较dataset merge and proc sql 中的 outer join 的贴子

Datastep merge and SQL join are very different, esp when having duplicates.
Be careful.

w**2
发帖数: 147

来自主题: Statistics版 - 请教关于SAS fuzzy match merge的问题

最近在做一个project，需要match merge大量的address。地址的方式写得千奇百怪的
，所以想请教一下大家，有啥比较好的fuzzy match的方法。我试了phonetic match，
但是不能做到完全找出match，而且还找到了一些根本就不match的。多谢。

P****D
发帖数: 11146

来自主题: Statistics版 - 请教关于SAS fuzzy match merge的问题

如果千奇百怪的话，你先把地址本身整理整理再merge吧。

x*******u
发帖数: 500

来自主题: Statistics版 - SAS merge problem

两个data，没有共同的变量。
data A 440个数据， data B 1500个数据。
merge后要达到的效果是，data A的数据根据Data B的数据重复1500次。
比如说，
data A
1
2
3；
data B
A
B
;
最后的数据是
A 1
A 2
A 3
B 1
B 2
B 3
请问有简单的命令吗？谢谢

d*********k
发帖数: 1239

来自主题: Statistics版 - sql里面的merge和join的区别是什么啊？

sql里面的merge和join的区别是什么啊？
这个题面试的时候经常被问到他们之间到底是什么区别呢？谢谢啦

t*****w
发帖数: 254

来自主题: Statistics版 - sql里面的merge和join的区别是什么啊？

it may generate the same result. in essence, merge is the full join. Join
may include other types of horizontal join.

y**3
发帖数: 267

来自主题: Statistics版 - sas few to many merging

I have two data sets;
ID month1
1 2
1 4
2 3
2 7
2 9
data set 2:
ID month2
1 1
1 2
1 3
1 4
1 5
2 1
etc
Wanna merge to:
ID month1 month2
1 . 1
1 2 2
1 . 3
1 4 4
1 . 5
2 . 1
etc
thanks

W*****r
发帖数: 193

来自主题: Statistics版 - 请教：多个csv文件（>100)怎么用"proc sql" merge成一个文件？

用各个csv的id合并起来
简单的的我知道怎么做
比如
proc import datafile="D:file1.csv"
out=new1 dbms=csv replace; getnames=yes;
run;
proc print data=new1;
run;
proc import datafile="D:file2.csv"
out=new2 dbms=csv replace; getnames=yes;
run;
proc print data=new2;
run;
data dn;
merge new1 new2;
by id;
run;
文件“dn”含new1 和 new2的所有variables, 使用公用的id
问题是，如果csv files很多怎么办？比如100+？
可不可以用proc sql + Macro做出来？
怎么做？
多谢。

f*******m
发帖数: 94

来自主题: Statistics版 - 请教：多个csv文件（>100)怎么用"proc sql" merge成一个文件？

see here:
https://communities.sas.com/t5/Base-SAS-Programming/Help-merging-multiple-
CSV-files-to-a-dataset/td-p/184804

B*O
发帖数: 105

来自主题: DataSciences版 - 在R里merge两个dataframe太慢了

用match比较快。
> long =data.frame(a=seq(1,1000000), b=rnorm(10) )
> short=data.frame(a=seq(1,2000, 2), c=rnorm(10) )
> system.time(merge(x=long, y=short, all.x=T))
user system elapsed
7.253 0.071 7.385
> system.time(long$c<-rep(as.numeric(NA),dim(long)[1]))
user system elapsed
0.002 0.000 0.001
> system.time(long$c[match(short$a,long$a)]<-short$c)
user system elapsed
0.119 0.000 0.119

r********n
发帖数: 7441

来自主题: DataSciences版 - 在R里merge两个dataframe太慢了

这种简单查找型 merg 显然是用 awk 比较快了

u********e
发帖数: 4950

来自主题: _Stockcafeteria版 - 熊熊的噩梦：Merge & Acquisitions

今天的市场又一次证明了熊熊的噩梦：Merge & Acquisitions
----------------------------------------
(1) 案例1： Texas Instruments ( TXN ) to buy National Semiconductor (NSM)
for $6.5 billion
股价变化： TXN - 上涨1.7% NSM: 上涨 71%
资金流：资金净流入 $1.7 billion
市场影响：所有半导体公司股票普涨 2% to 3%
(2) 案例2： Diamond (DMND) to buy Pringles from P&G (pg) for $1.5 billion
股价变化： pending
资金流： pending
市场影响： pending
----------------------------------------
从上述分析中可以看出，每次 M&A 实际上等于公司出大价钱到股市上抢股票，
shorter will be squeezed, 而且无路可逃（假如你昨... 阅读全帖

g********d
发帖数: 19244

来自主题: Automobile版 - [合集] 被人撞，最后自己全责，求分析

☆─────────────────────────────────────☆
lxy52136 (sunny) 于 (Sun Nov 3 14:24:24 2013, 美东) 提到:
11月4日update:出事地点的卫星地图贴在了84楼
出事地点在INTERSTATE95 52至51这段路上（车已过52号出口准备从51下）
我家买的保险是全保，自己这边修车要掏500元
68楼我放了事故报告单，第4页67楼是我马赛克我家车牌后的3张事故后的照片
对方被撞情况，我问了一下老公，前边撞坏，侧身没看
老公前天晚上7：30左右自己一人驾车从高速开车回家，离出口还有1迈左右的距离后，
换lane到最右侧车道，当继续行使半迈左右后和一车发生碰撞，当时老公觉得车内有很
强的压迫感，自己有清醒的意识大概在出事5、6秒后，发现一车飞快的从他右侧飞奔儿
去，我老公觉得他车速度之快，可能要逃跑，下意识的开车去追，但那车在前方不远处
的紧急停车带上停下了，我老公把车停在他后面。这人下车后十分惊慌，在等警察的时
候问我老公是不... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天