SQL求教：如何求出各组Top N的记录？ - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - SQL求教：如何求出各组Top N的记录？

相关主题
● SAS Proc SQL count问题	● SQL中啥时候用group by, 啥时候用self-join?
● 这组数据用sas程序怎么转化谢谢！	● 完全不懂SAS，急请教一个问题
● 在用SAS RUN一个SQL,可是出现了错误,请高手指教	● 请教一个SQL 的问题
● MySQL语句请教 (转载)	● [SQL] just figured out how to loop through all columns in a table
● 了解RPA的同学来指点我一下	● SQL combine two tables into one table and add a new column (转载)
● SQL相关的Interview问题请教	● compare two large tables SQL (转载)
● 這個SQL sub-query 是否畫蛇添足 ?	● SQL find distinct values in large table (转载)
● 为什么结果是这样?	● SQL copy a table into a new table and add a new column (转载)

相关话题的讨论汇总
话题: rank话题: select话题: year话题: sql话题: score

进入Statistics版参与讨论

(共1页)

p****e
发帖数: 165

就一张表，叫Application，以下为column names, 就是某高中每一届的同学申请各大
学的入学成绩。
Name, Year_Month, School, Score
Lily 08/2010 MIT 1570
Jim 02/2009 Berkeley 1490
...
问题：现在需要选出每一年申请各个学校入学分数排名前10名同学的Name and Score。
我卡在两个地方：
1）如何从Year_Month中职提取Year？
2）如何分组后求各组的top 10?
感谢各位大牛相助，不甚感激！用SAS SQL或者其他各版本SQL都可以滴，感谢！

a*****3
发帖数: 601

那我就用oracle怎么样？
1.提取年用extract( year from )函数
2求topN row 用rownum< N
要是写通了赏个铜板怎么样？

s******r
发帖数: 1524

1.
pure sql;
extract(year) from ....
qualify rank() over (partition by year, school order by score desc) <=10
2. SAS proc rank?

【在 p****e 的大作中提到】

: 就一张表，叫Application，以下为column names, 就是某高中每一届的同学申请各大
: 学的入学成绩。
: Name, Year_Month, School, Score
: Lily 08/2010 MIT 1570
: Jim 02/2009 Berkeley 1490
: ...
: 问题：现在需要选出每一年申请各个学校入学分数排名前10名同学的Name and Score。
: 我卡在两个地方：
: 1）如何从Year_Month中职提取Year？
: 2）如何分组后求各组的top 10?

a*****3
发帖数: 601

牛!! 毕竟是oracle大拿，推荐本oracle书吧，容易上手的。

【在 s******r 的大作中提到】

: 1.
: pure sql;
: extract(year) from ....
: qualify rank() over (partition by year, school order by score desc) <=10
: 2. SAS proc rank?

d******9
发帖数: 404

In this case, it is better to use SAS data step than Proc SQL.

a********t
发帖数: 1810

很EASY。第一步是把MM/YYYY换成YYYY，然后用
SELECT COUNT(*) from...
MS ACCESS 做起来最快。把它发给我，我给你做
EXCEL也能做，但要写一些FORMULA，ACCESS 写好SQL，RUN一下就行了

g*********r
发帖数: 2847

1. base on the format of the Yrmo field. if it is string (char,varchar,etc),
right([yrmo],4) works
2. rank() over(partition by ... order by ...)

a*****3
发帖数: 601

你这个rank()放在哪里？ where clause?恐怕不行吧？

),

【在 g*********r 的大作中提到】

: 1. base on the format of the Yrmo field. if it is string (char,varchar,etc),
: right([yrmo],4) works
: 2. rank() over(partition by ... order by ...)

g*********r
发帖数: 2847

select *,
rank = rank () over(....................)
it will add a new column called rank and then you can use where clause

【在 a*****3 的大作中提到】

: 你这个rank()放在哪里？ where clause?恐怕不行吧？
:
: ),

a*****3
发帖数: 601

可楼主要的是拓扑10

【在 g*********r 的大作中提到】

:
: select *,
: rank = rank () over(....................)
: it will add a new column called rank and then you can use where clause

相关主题
● SQL相关的Interview问题请教	● SQL中啥时候用group by, 啥时候用self-join?
● 這個SQL sub-query 是否畫蛇添足 ?	● 完全不懂SAS，急请教一个问题
● 为什么结果是这样?	● 请教一个SQL 的问题
进入Statistics版参与讨论

g*********r
发帖数: 2847

it should work.

【在 a*****3 的大作中提到】

: 可楼主要的是拓扑10

p****e
发帖数: 165

谢谢givemeoffer, 我用MySQL，所以好像没有rank() over partition
用open source的SQL，如MySQL, PostgreSQL怎么求呢？
这里有一篇文章讲用mysql求的，得编个循环的自定义函数：
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstlea
好麻烦呀，很多小公司都木有钱装oracle, sql server啊~

),

【在 g*********r 的大作中提到】

: 1. base on the format of the Yrmo field. if it is string (char,varchar,etc),
: right([yrmo],4) works
: 2. rank() over(partition by ... order by ...)

g*********r
发帖数: 2847

Actually, I have no experience on MySQL, sorry.
循环的自定义函数 will definitely work.
However, if the data set is not huge, i would recommend using Excel

p****e
发帖数: 165

Excel的话，用哪个function来算各组的top n呢？谢谢啦~

【在 g*********r 的大作中提到】

: Actually, I have no experience on MySQL, sorry.
: 循环的自定义函数 will definitely work.
: However, if the data set is not huge, i would recommend using Excel

g*********r
发帖数: 2847

1.sort by year,school,score
2.in a new column, write a formula to rank top n in each group and leave
others blank

【在 p****e 的大作中提到】

: Excel的话，用哪个function来算各组的top n呢？谢谢啦~

a********t
发帖数: 1810

MS Access is OK, select count(*)...

a***d
发帖数: 336

2nd this~

【在 d******9 的大作中提到】

: In this case, it is better to use SAS data step than Proc SQL.

P****D
发帖数: 11146

A sub query will do it.
SELECT *
FROM ( SELECT a.name, a.score, RANK () OVER
( PARTITION BY a.year, a.school
ORDER BY score
) AS rank
FROM a)
WHERE rank<=10;

【在 a*****3 的大作中提到】

: 可楼主要的是拓扑10

a*****3
发帖数: 601

和楼上的争了半天都是风马牛不相及 - 还是贱妾最明白朕的心思；朕可真是没看错人.
我老给楼上念一念：
747页， 17章，第一节： advanced SQL concepts, analytical functions, and the
WITH clause, :
You perform query processing with analytical functions in several steps,
First, joins, WHERE, GROUPBY and having clauses are carried out. The
analytical functions then use these results.
看见了，不能在where里直接用这些rank() functions

【在 P****D 的大作中提到】

: A sub query will do it.
: SELECT *
: FROM ( SELECT a.name, a.score, RANK () OVER
: ( PARTITION BY a.year, a.school
: ORDER BY score
: ) AS rank
: FROM a)
: WHERE rank<=10;

i****e
发帖数: 46

有 SQL server的话，可以用rank, dense_rank 和 row_number来做，这三个处理tie
的时候不一样。也可以用 top 10。
没有的话，用sas proc rank也挺方便.
proc rank data=x out=y descending tie=low; /*tie can be high, dense etc
depending on you requirement*/
by year school;
var score;
rank rank_count;
run;
data y;
set y;
if rank_count<=10;
run;

【在 p****e 的大作中提到】

: 谢谢givemeoffer, 我用MySQL，所以好像没有rank() over partition
: 用open source的SQL，如MySQL, PostgreSQL怎么求呢？
: 这里有一篇文章讲用mysql求的，得编个循环的自定义函数：
: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstlea
: 好麻烦呀，很多小公司都木有钱装oracle, sql server啊~
:
: ),

相关主题
● [SQL] just figured out how to loop through all columns in a table	● SQL find distinct values in large table (转载)
● SQL combine two tables into one table and add a new column (转载)	● SQL copy a table into a new table and add a new column (转载)
● compare two large tables SQL (转载)	● SQL run a stored procedure by fetching from a cursor row by (转载)
进入Statistics版参与讨论

g*********r
发帖数: 2847

人.
the
兄弟你亮了，没人在where clause 里面用rank()。pharmD说的和我之前说的都是一个
意思，
在select 里用的 rank=rank() over....就是加了一列，列名叫rank,然后where 里用
的是rank这个列名不是rank function

【在 a*****3 的大作中提到】

: 和楼上的争了半天都是风马牛不相及 - 还是贱妾最明白朕的心思；朕可真是没看错人.
: 我老给楼上念一念：
: 747页， 17章，第一节： advanced SQL concepts, analytical functions, and the
: WITH clause, :
: You perform query processing with analytical functions in several steps,
: First, joins, WHERE, GROUPBY and having clauses are carried out. The
: analytical functions then use these results.
: 看见了，不能在where里直接用这些rank() functions

g*********r
发帖数: 2847

picky一小下， order by score DESC
FROM a) 后面要命名
别拿锅打我的头。。。。

【在 P****D 的大作中提到】

: A sub query will do it.
: SELECT *
: FROM ( SELECT a.name, a.score, RANK () OVER
: ( PARTITION BY a.year, a.school
: ORDER BY score
: ) AS rank
: FROM a)
: WHERE rank<=10;

P****D
发帖数: 11146

啊，对！谢谢！
没空打你，我得先去打占我便宜的alex。

【在 g*********r 的大作中提到】

:
: picky一小下， order by score DESC
: FROM a) 后面要命名
: 别拿锅打我的头。。。。

a*****3
发帖数: 601

这个可真是‘手不折新荷，枉受攀花辱。’
我老可是见证了pharmd从玉女到欲女；从青涩id到嫁为人妇;
多少次看得我
青山依旧在几度夕阳红
是非成败转头空.
明天是pharmd新婚整整一周年，
我看应该每人说句祝福的话，
然后版主发包子。

【在 P****D 的大作中提到】

: 啊，对！谢谢！
: 没空打你，我得先去打占我便宜的alex。

g*********r
发帖数: 2847

内个，那啥，那就祝PharmD那啥快乐呗，要是有个啥饭局是最好，虽然今天单位管饭

L****n
发帖数: 3545

you can order by and select top N......
limit obs N (?? don't remember exactly)

【在 p****e 的大作中提到】

s*****n
发帖数: 2174

SELECT * FROM
(SELECT
RIGHT(Year_Month, 4) AS year,
School,
Name,
Score,
ROW_NUMBER() OVER(
PARTITION BY year, school
ORDER BY score DESC) AS rk
FROM
your_table_name) temp_table
WHERE
temp_table.rk <= 10;

【在 p****e 的大作中提到】

g*********r
发帖数: 2847

1. rank() is better than row_number() because there might be several people
with same score.
2. don't think you can use alias (for this example, "year") within the same
query where the alias is created. In other words, you have to put RIGHT(Year
_Month, 4) instead of year after "PARTITION BY"

【在 s*****n 的大作中提到】

: SELECT * FROM
: (SELECT
: RIGHT(Year_Month, 4) AS year,
: School,
: Name,
: Score,
: ROW_NUMBER() OVER(
: PARTITION BY year, school
: ORDER BY score DESC) AS rk
: FROM

s*****n
发帖数: 2174

your reason 1 is exactly why ROW_NUMBER() is preferred, since this will
guarantee to return
10 rows, regardless of ties.
it probably depends on the system you are using. I do this all the time in
my everyday work:
SELECT
group_id,
month_id,
sum(some_field) AS total_count,
ROW_NUMBER() (PARTITION BY group_id ORDER BY total_count DESC)...
FROM
...
GROUP BY
group_id, month_id;

people
same
Year

【在 g*********r 的大作中提到】

:
: 1. rank() is better than row_number() because there might be several people
: with same score.
: 2. don't think you can use alias (for this example, "year") within the same
: query where the alias is created. In other words, you have to put RIGHT(Year
: _Month, 4) instead of year after "PARTITION BY"

g*********r
发帖数: 2847

我的理解,10个最高分的话，如果第10有2个一样分的，应该都选出来。
第二个可能如你所说，我用SQL server 10.0是不行

【在 s*****n 的大作中提到】

: your reason 1 is exactly why ROW_NUMBER() is preferred, since this will
: guarantee to return
: 10 rows, regardless of ties.
: it probably depends on the system you are using. I do this all the time in
: my everyday work:
: SELECT
: group_id,
: month_id,
: sum(some_field) AS total_count,
: ROW_NUMBER() (PARTITION BY group_id ORDER BY total_count DESC)...

相关主题
● 求问一个关于SQL的exist的问题	● 这组数据用sas程序怎么转化谢谢！
● SAS ADV 63题中第9题的答案疑问	● 在用SAS RUN一个SQL,可是出现了错误,请高手指教
● SAS Proc SQL count问题	● MySQL语句请教 (转载)
进入Statistics版参与讨论

p****e
发帖数: 165

感谢指导，完全理解了！
再补充一下，在MySQL这个免费数据库中，没有现成的row_number() over可以用，所以
得用自定义函数，我只学会了可以partition by 一个column的，自己尝试了好几种方
法想通过自定义函数partition by多个column还没有找到可以partition by多个的。还
是SQL server强大~我又另开了个贴讨论这个问题。最后，再次感谢你的解答。

【在 s*****n 的大作中提到】

: SELECT * FROM
: (SELECT
: RIGHT(Year_Month, 4) AS year,
: School,
: Name,
: Score,
: ROW_NUMBER() OVER(
: PARTITION BY year, school
: ORDER BY score DESC) AS rk
: FROM

p****e
发帖数: 165

麻烦问一下givemeoffer,用什么函数可以在excel中rank top n in each group and
leave
others blank？比如我已经sort by year, school score了
谢谢!

【在 g*********r 的大作中提到】

:
: 我的理解,10个最高分的话，如果第10有2个一样分的，应该都选出来。
: 第二个可能如你所说，我用SQL server 10.0是不行

p****e
发帖数: 165

这如果用Excel处理，你说的新建一个column后，就用sumproduct函数对不？我懂得怎
么做了。谢谢啦~

【在 g*********r 的大作中提到】

:
: 我的理解,10个最高分的话，如果第10有2个一样分的，应该都选出来。
: 第二个可能如你所说，我用SQL server 10.0是不行

R*********r
发帖数: 225

MySQL没有row_number, 你可以做self join，count下产生index，然后再给index 加条件。
假设有表
test1:
college_id
student_id
gpa
目标：找出每个学校里面的学生的gpa最高的10个学生
SELECT X1.college_id, X1.student_id, COUNT(DISTINCT x2.student_id) Index
FROM (
SELECT college_id, student_id, gpa
) X1 INNER JOIN
SELECT college_id, student_id, gpa
) X2 ON
X1.college_id = X2.college_id AND
X1.gpa <= X2.gpa
GROUP BY
X1.college_id, X1.student_id
HAVING COUNT(DISTINCT x2.student_id) <=10
;

【在 p****e 的大作中提到】

: 感谢指导，完全理解了！
: 再补充一下，在MySQL这个免费数据库中，没有现成的row_number() over可以用，所以
: 得用自定义函数，我只学会了可以partition by 一个column的，自己尝试了好几种方
: 法想通过自定义函数partition by多个column还没有找到可以partition by多个的。还
: 是SQL server强大~我又另开了个贴讨论这个问题。最后，再次感谢你的解答。

(共1页)

进入Statistics版参与讨论

相关主题
● SQL copy a table into a new table and add a new column (转载)	● 了解RPA的同学来指点我一下
● SQL run a stored procedure by fetching from a cursor row by (转载)	● SQL相关的Interview问题请教
● 求问一个关于SQL的exist的问题	● 這個SQL sub-query 是否畫蛇添足 ?
● SAS ADV 63题中第9题的答案疑问	● 为什么结果是这样?
● SAS Proc SQL count问题	● SQL中啥时候用group by, 啥时候用self-join?
● 这组数据用sas程序怎么转化谢谢！	● 完全不懂SAS，急请教一个问题
● 在用SAS RUN一个SQL,可是出现了错误,请高手指教	● 请教一个SQL 的问题
● MySQL语句请教 (转载)	● [SQL] just figured out how to loop through all columns in a table

相关话题的讨论汇总
话题: rank话题: select话题: year话题: sql话题: score

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天