由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - PROC SQL运行速度问题.
相关主题
从SQL server调数据到SAS 处理后再转回SQL server里SAS question:Please give me your suggestions!
Questions about opreate large data set, thank you!一个不小心 Adv考了个100
一个SAS 和 SQL server 问题Proc SQL 能不能直接读 非 SAS format 的数据,txt , csv 等
面试的SQL问题一般问些啥?问个SAS的问题
有谁对SAS和R都懂得吗?珍惜生命,远离hsbc。
Insurance company job openingsHow to do "data quality check in a methodical manner" By SAS or Proc SQL ?
关于SAS Advance 9 的考试体会学SAS之前需要先懂得SQL嘛?
Help! proc sql;cout not null value.Oracle
相关话题的讨论汇总
话题: sql话题: sas话题: database话题: proc
进入Statistics版参与讨论
1 (共1页)
y********0
发帖数: 638
1
最近用proc sql通过ODBC连接数据库,用的A left join B查询,其中B是SQL
architecture数据库下的一个table. 速度极其的慢,大约要一两个小时,最近老被管理
员揪,说没有看见where-
filter,会造成服务器负担.
我的proc sql 下使用了where,但是管理员端收到的query里没有看到任何的filter. 想
问一下:
1. sas 下的proc sql是什么原理阿,是发送query到服务器端,然后接受结果么? 还是
sas自己本身集成了SQL module,把那个服务器端的server当作linked server? (这个问
题有可能问的不对,错了请指正).
2. SAS odbc连接原理和sql server management studio 原理一样吗?同样的query, 用
sql sever mgt studio只需要一两分钟.但是SAS真的很慢阿.
谢谢啦.
t*****w
发帖数: 254
2
SAS is not good to handle large datasets,just use Sql server management
studio to output a small dataset for SAS proc data step.
s******e
发帖数: 343
3
看你的查询语句里有没有SAS function,比如说在where里面日期格式转换然后比较什
么的,如果有的话去掉只用纯SQL,看看会不会有变化。
s*r
发帖数: 2757
4
there are 2 places to put where under proc sql
A*******s
发帖数: 3942
5
用pass through

【在 y********0 的大作中提到】
: 最近用proc sql通过ODBC连接数据库,用的A left join B查询,其中B是SQL
: architecture数据库下的一个table. 速度极其的慢,大约要一两个小时,最近老被管理
: 员揪,说没有看见where-
: filter,会造成服务器负担.
: 我的proc sql 下使用了where,但是管理员端收到的query里没有看到任何的filter. 想
: 问一下:
: 1. sas 下的proc sql是什么原理阿,是发送query到服务器端,然后接受结果么? 还是
: sas自己本身集成了SQL module,把那个服务器端的server当作linked server? (这个问
: 题有可能问的不对,错了请指正).
: 2. SAS odbc连接原理和sql server management studio 原理一样吗?同样的query, 用

c******y
发帖数: 3269
6
more specifically, explicit sql-passthrough

【在 A*******s 的大作中提到】
: 用pass through
m***c
发帖数: 118
7
proc sql首先产生笛卡尔,然后在其中寻找match,如果没有很好的filter,运行自然比
较慢,而且容易死机。
k*z
发帖数: 4704
8
这个问题我们已经遇到过了,解决办法很简单,让DBA or ELT developer把你要的join
做成view放到数据库上去。如果经常用可以写store procedure,在SAS batch里call.
你需要做的是要在一个database里把join做好,也就是你要把Join做成一个view放到
database上去,否则你把两个tables在SAS里做join,SAS会把数据从SQL都带过来,然后
join,基本上就是你需要把所有的数据都带到服务器或者本地然后join,而不是在sql里
join好了,然后查询。
如果你的一个表在lib a, 一个表在lib b,那么就更糟糕了,sas会把数据都带到服务器
或者本机,做 Cartesian join,然后查询额。
c******y
发帖数: 3269
9
You are right about "需要把所有的数据都带到服务器或者本地" if it is SAS
implicit SQL-passthrough, in the means of libname, without in-database
processing.
To resolve the issue, there are several approaches.
1. views/stored procedures as you mentioned, which requires additional
maintenance and only fits for "frequently-used" joins.
2. implicit SQL-passthrough with in-database processing, which requires
knowledge on additional SAS syntax, and has a limitation on the usage.
3. explicit SQL-passthrough, which only passes a database-native SQL command
to the database, and the database will do the join (aka, "in-database
processing"), then SAS will only fetch the final result from the database.
The last approach requires database-native SQL knowledge, but requires least
effort for LZ since LZ is already using SQL server management studio.

join

【在 k*z 的大作中提到】
: 这个问题我们已经遇到过了,解决办法很简单,让DBA or ELT developer把你要的join
: 做成view放到数据库上去。如果经常用可以写store procedure,在SAS batch里call.
: 你需要做的是要在一个database里把join做好,也就是你要把Join做成一个view放到
: database上去,否则你把两个tables在SAS里做join,SAS会把数据从SQL都带过来,然后
: join,基本上就是你需要把所有的数据都带到服务器或者本地然后join,而不是在sql里
: join好了,然后查询。
: 如果你的一个表在lib a, 一个表在lib b,那么就更糟糕了,sas会把数据都带到服务器
: 或者本机,做 Cartesian join,然后查询额。

k*z
发帖数: 4704
10
in database processing只支持teradata和greenlum吧

command

【在 c******y 的大作中提到】
: You are right about "需要把所有的数据都带到服务器或者本地" if it is SAS
: implicit SQL-passthrough, in the means of libname, without in-database
: processing.
: To resolve the issue, there are several approaches.
: 1. views/stored procedures as you mentioned, which requires additional
: maintenance and only fits for "frequently-used" joins.
: 2. implicit SQL-passthrough with in-database processing, which requires
: knowledge on additional SAS syntax, and has a limitation on the usage.
: 3. explicit SQL-passthrough, which only passes a database-native SQL command
: to the database, and the database will do the join (aka, "in-database

相关主题
Insurance company job openingsSAS question:Please give me your suggestions!
关于SAS Advance 9 的考试体会一个不小心 Adv考了个100
Help! proc sql;cout not null value.Proc SQL 能不能直接读 非 SAS format 的数据,txt , csv 等
进入Statistics版参与讨论
c******y
发帖数: 3269
11
Implicit SQL-passthrough in-database processing is best supported in
teradata indeed, while not quite friendly in MS SQL server considering the
very limited usage.
So I recommends LZ to use explicit SQL-passthrough instead, which is
naturally "in-database processing", and cost LZ least effort

【在 k*z 的大作中提到】
: in database processing只支持teradata和greenlum吧
:
: command

k*z
发帖数: 4704
12
能详细说说去别么,大牛,这个问题也很困扰我。

【在 c******y 的大作中提到】
: Implicit SQL-passthrough in-database processing is best supported in
: teradata indeed, while not quite friendly in MS SQL server considering the
: very limited usage.
: So I recommends LZ to use explicit SQL-passthrough instead, which is
: naturally "in-database processing", and cost LZ least effort

c******y
发帖数: 3269
13
I'm not 大牛, and no longer use MS SQL server. Below is a real-life example,
hope it helps.
Background: VPN network to connect database, a bottleneck to overcome.
implicit sql-passthrough is exchanging data intensively, which choked the
VPN network, where you would see huge difference between real time and cpu
time.
explicit sql-passthrough only sends a command to the database, the database
do the data manipulation within itself, and sends final data back to SAS,
costing way much less network resource.
/*implicit sql-passthrough*/
libname lib odbc user=&user password=&pwd DATASRC=&dsn ;
proc sql;
create table temp as
select col
from lib.table
where col="XXX"
;
quit;
libname lib clear;
/*
NOTE: PROCEDURE SQL used (Total process time):
real time 1:35.54
user cpu time 1.00 seconds
system cpu time 0.52 seconds
*/
/*explicit sql-passthrough*/
proc sql;
connect to odbc as lib (user=&user password=&pwd DATASRC=&dsn) ;
create table temp2 as
select * from connection to lib
(
select col
from lib.table
where col="XXX"
);
disconnect from lib;
quit;
/*
NOTE: PROCEDURE SQL used (Total process time):
real time 0.17 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
*/

【在 k*z 的大作中提到】
: 能详细说说去别么,大牛,这个问题也很困扰我。
k*z
发帖数: 4704
14
如果你要做一些计算呢?
比如平均值,怎么办/
在from (select....)里面直接做么?

example,
database

【在 c******y 的大作中提到】
: I'm not 大牛, and no longer use MS SQL server. Below is a real-life example,
: hope it helps.
: Background: VPN network to connect database, a bottleneck to overcome.
: implicit sql-passthrough is exchanging data intensively, which choked the
: VPN network, where you would see huge difference between real time and cpu
: time.
: explicit sql-passthrough only sends a command to the database, the database
: do the data manipulation within itself, and sends final data back to SAS,
: costing way much less network resource.
: /*implicit sql-passthrough*/

c******y
发帖数: 3269
15
In my opinion, yes.
Most DBs outperform SAS in ETL, should take advantage of that if aggregate
functions are supported by DB.
Also, many BI reports' data are processed directly by DBs, which is more
efficient to deliver the results.

【在 k*z 的大作中提到】
: 如果你要做一些计算呢?
: 比如平均值,怎么办/
: 在from (select....)里面直接做么?
:
: example,
: database

k*z
发帖数: 4704
16
last questin for you. sorry bother too much.
how about i want to join two tables from two odbc datascr?
or I have to write two proc sqls to bring two tables then join?
proc sql;
connect to odbc as lib1 (user=&user1 password=&pwd1 DATASRC=&dsn1) ;
connect to odbc as lib2 (user=&user2 password=&pwd2 DATASRC=&dsn2) ;
create table temp2 as
select * from connection to lib
(
select col
from lib.table
where col="XXX"
);
disconnect from lib;
quit;

【在 c******y 的大作中提到】
: In my opinion, yes.
: Most DBs outperform SAS in ETL, should take advantage of that if aggregate
: functions are supported by DB.
: Also, many BI reports' data are processed directly by DBs, which is more
: efficient to deliver the results.

c******y
发帖数: 3269
17
The two datasrcs refer to two types of RDBMSs?
In this case, if one of them supports dblink (for example, Oracle), then you
can do it within that RDBMS.
If they are two databases or schemas within the same RDBMS, then you usually
can give the full path, e.g. database.schema.table, to join them

【在 k*z 的大作中提到】
: last questin for you. sorry bother too much.
: how about i want to join two tables from two odbc datascr?
: or I have to write two proc sqls to bring two tables then join?
: proc sql;
: connect to odbc as lib1 (user=&user1 password=&pwd1 DATASRC=&dsn1) ;
: connect to odbc as lib2 (user=&user2 password=&pwd2 DATASRC=&dsn2) ;
: create table temp2 as
: select * from connection to lib
: (
: select col

y********0
发帖数: 638
18
Update: 谢谢大家的讨论,收益匪浅.最终是用的Explicit SQL Pass-Through.
1. 像chairsky 说的,所有的SQL Server Mgt Studio代码完全copy到SAS,几乎不需要改
动;象一般的TSQL代码,创建temp table,CTE啥,都能直接copy过来.平常sas 下的query
需要30分钟,现在不到一分钟.
2. 同样的RDBMS的数据库,只需要标明full path,可以解决问题.至于两个不同的RDBMSs
,还没机会试验.
再次谢谢大家的所有的发言!
k*****n
发帖数: 361
19
顶这个
S****M
发帖数: 2198
20
我司平时用Teradata,但在SAS里还是用explicit pass-through最多。这样SQL在SQL
Assistant里写好debug完就能直接在SAS里跑。Explicit的问题(或许我孤陋寡闻)是
没法上传数据,这时候就只能用libname了。
1 (共1页)
进入Statistics版参与讨论
相关主题
Oracle有谁对SAS和R都懂得吗?
求救:SAS programmer面试题!Insurance company job openings
which route in SAS is faster?关于SAS Advance 9 的考试体会
sas的sql语法Help! proc sql;cout not null value.
从SQL server调数据到SAS 处理后再转回SQL server里SAS question:Please give me your suggestions!
Questions about opreate large data set, thank you!一个不小心 Adv考了个100
一个SAS 和 SQL server 问题Proc SQL 能不能直接读 非 SAS format 的数据,txt , csv 等
面试的SQL问题一般问些啥?问个SAS的问题
相关话题的讨论汇总
话题: sql话题: sas话题: database话题: proc