sas question ( you bao zi ) - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - sas question ( you bao zi )

相关主题
● How to Macro it in SAS?	● 请问这个程序的OUTPUT为啥跟我想的不一样。。。。
● 问个问题sas proc sort	● [合集] 请问一个sas数据集过滤的问题
● 如何找出没有duplicate的数字[done]	● 也贡献一个电面经验
● 请教这种freq 该用什么code算(sas)？Thanks!	● 请帮忙看3道SAS题。
● sas问题	● 问个比较具体的算法问题
● 用 sas 分组问题	● 问个sas问题（包子)
● 请教一个proc transpose的问题	● SAS code 求助
● 想用first累计character的变量，可以吗？	● ADV 63题中的45题

相关话题的讨论汇总
话题: acct话题: column话题: proc话题: sql话题: sorted

进入Statistics版参与讨论

(共1页)

a******n
发帖数: 11246

suppose i have a table with one column called acct_id.
this column is sorted already.
how do I find if there's any duplicate value or not efficiently?
will give double yellow baozi to the most efficient method. thanks

c*********r
发帖数: 1802

试试
proc freq
by acct_id;

x***x
发帖数: 3401

proc sort data=a nodupkey;
by acct_id;
This will eliminate any duplicated value of acct_id.

a******n
发帖数: 11246

this will give all acct_id freq ba?
this table has millions of acct_id.
i just want to find if by any change one acct_id has more than 1 record...

【在 c*********r 的大作中提到】

: 试试
: proc freq
: by acct_id;

a******n
发帖数: 11246

then compare the row numbers?
but if the row numbers are not the same, i still don't know which acct_id
has more then one record ya...

【在 x***x 的大作中提到】

: proc sort data=a nodupkey;
: by acct_id;
: This will eliminate any duplicated value of acct_id.

x***x
发帖数: 3401

so you want to print out the duplicated records?
then try
proc sql;
select acct_id from a group by acct_id having count(*)>1
;
quit;

acct_id

【在 a******n 的大作中提到】

: then compare the row numbers?
: but if the row numbers are not the same, i still don't know which acct_id
: has more then one record ya...

a******n
发帖数: 11246

thanks. this is what i am using now.
still waiting the result to come out :(
it seems that this method does not take the advantage
that the column is already sorted.

【在 x***x 的大作中提到】

: so you want to print out the duplicated records?
: then try
: proc sql;
: select acct_id from a group by acct_id having count(*)>1
: ;
: quit;
:
: acct_id

x***x
发帖数: 3401

if you want to take advantage that the column is sorted, you may use
'RETAIN' statement in the data step. But i dont think it is any more
efficient.

【在 a******n 的大作中提到】

: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

L****n
发帖数: 3545

you can try compress=YES option,
and, sorted column doesn't speed up the process, but index does.

【在 a******n 的大作中提到】

: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

a******n
发帖数: 11246

i see...thanks.
will baozi you soon.

【在 x***x 的大作中提到】

: if you want to take advantage that the column is sorted, you may use
: 'RETAIN' statement in the data step. But i dont think it is any more
: efficient.

相关主题
● 用 sas 分组问题	● 请问这个程序的OUTPUT为啥跟我想的不一样。。。。
● 请教一个proc transpose的问题	● [合集] 请问一个sas数据集过滤的问题
● 想用first累计character的变量，可以吗？	● 也贡献一个电面经验
进入Statistics版参与讨论

a******n
发帖数: 11246

good. thanks.
baozi soon.

【在 L****n 的大作中提到】

: you can try compress=YES option,
: and, sorted column doesn't speed up the process, but index does.

x***x
发帖数: 3401

another way of speeding up is to create a temporary data set that only has
ACCT_ID before go into the sql step. This will help killing some overhead in
the sql step.
data b;
set a(keep=acct_id);
run;
proc sql;
....

L****n
发帖数: 3545

no, I want it now.
Or you consider me as your nurse so we can do business together ?
lol

【在 a******n 的大作中提到】

: good. thanks.
: baozi soon.

a******n
发帖数: 11246

我自己都不干了。上班太忙了。

【在 L****n 的大作中提到】

: no, I want it now.
: Or you consider me as your nurse so we can do business together ?
: lol

L****n
发帖数: 3545

already working le?
zan !!!!!!!!!!
xiao shen yi, tai niu le !

【在 a******n 的大作中提到】

: 我自己都不干了。上班太忙了。

a*****3
发帖数: 601

既然已经排过续，就用
if Fist.ID <> last.ID
怎么样啊，看得过去的话，赏个两个铜板吧

【在 a******n 的大作中提到】

: 我自己都不干了。上班太忙了。

a******n
发帖数: 11246

hao. baozi sent.
i will compare speed of this method and the proc sql one :D

【在 a*****3 的大作中提到】

: 既然已经排过续，就用
: if Fist.ID <> last.ID
: 怎么样啊，看得过去的话，赏个两个铜板吧

a*****3
发帖数: 601

那不用说，肯定是我的跑得飞快啦，多谢天使包子

【在 a******n 的大作中提到】

: hao. baozi sent.
: i will compare speed of this method and the proc sql one :D

x**********0
发帖数: 2260

赞。想到这个，忘记code了。呵呵

【在 x***x 的大作中提到】

: proc sort data=a nodupkey;
: by acct_id;
: This will eliminate any duplicated value of acct_id.

g****8
发帖数: 2828

you don't need to compare yourself.
The log file will tell you how many dup there.

【在 a******n 的大作中提到】

: then compare the row numbers?
: but if the row numbers are not the same, i still don't know which acct_id
: has more then one record ya...

相关主题
● 请帮忙看3道SAS题。	● SAS code 求助
● 问个比较具体的算法问题	● ADV 63题中的45题
● 问个sas问题（包子)	● SAS ADV 63题中第45题
进入Statistics版参与讨论

g****8
发帖数: 2828

create a table for that sql, it will be faster than just use SELECT. SAS
needs some time to print out large dataset.

【在 a******n 的大作中提到】

: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

a******n
发帖数: 11246

有道理！。

id

【在 g****8 的大作中提到】

: you don't need to compare yourself.
: The log file will tell you how many dup there.

a******n
发帖数: 11246

thanks...学习了~

【在 g****8 的大作中提到】

: create a table for that sql, it will be faster than just use SELECT. SAS
: needs some time to print out large dataset.

g****8
发帖数: 2828

哈哈，谢谢双黄包呀。

【在 a******n 的大作中提到】

: thanks...学习了~

p********r
发帖数: 1465

proc sort... nodupkey out= dupout=;
by accountid;
run;
out= 输出的是不重复的，dupout输出的是所有重复过的。
用compress，之前的童鞋说过了。
million级别的数据也不用太久，很快能出结果。

m*****y
发帖数: 229

don't know whether this work in PROC SQL: count (distinct acct_id )

(共1页)

进入Statistics版参与讨论

相关主题
● ADV 63题中的45题	● sas问题
● SAS ADV 63题中第45题	● 用 sas 分组问题
● which route in SAS is faster?	● 请教一个proc transpose的问题
● 请教 2 道SAS Adv 真题	● 想用first累计character的变量，可以吗？
● How to Macro it in SAS?	● 请问这个程序的OUTPUT为啥跟我想的不一样。。。。
● 问个问题sas proc sort	● [合集] 请问一个sas数据集过滤的问题
● 如何找出没有duplicate的数字[done]	● 也贡献一个电面经验
● 请教这种freq 该用什么code算(sas)？Thanks!	● 请帮忙看3道SAS题。

相关话题的讨论汇总
话题: acct话题: column话题: proc话题: sql话题: sorted

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天