由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - 请教 query
相关主题
我一直以为beijing是个mmMySQL Question
sql面试题1mysql challenge
用SSIS EXPORT 到 EXCEL 2010 有2000个COLUMN,可能吗Re: question about CAST and Multiset
问一个queryORA0-01422
问一个linq to sql编程问题[转载] Hehe,不好意思,another question:)
generate unique integer ID from columns in SQL table (转载Access database size
BJ,那个COLUMNSTORE INDEX是不是BITMAP INDEX啊how to get the result in the middle of resultset?
请教个VBA问题what is optimizer statistics in Oracle?
相关话题的讨论汇总
话题: hobby话题: hobbies话题: users话题: age话题: table
进入Database版参与讨论
1 (共1页)
g*********n
发帖数: 43
1
Two tables, the first one "Users" saves the user name and some properties of
the user, age, gender, . . and the last column is "hobbies", to save the
hobbies for this particular user. Since hobbies can be quite diverse, in
this table we put an id of the hobbies.
name age gender hobby_id
A 20 M 1
B 45 F 1
C 52 M 2
. . .
And we define another table to map this id into real hobbies:
hobby_id hobby
1 swimming
1 pingpong
1 soccer
. . . other hobbies associated with id "1"
2 badminton
2 travel
2 pingpong (a hobby can be associated with multiple id)
. . . other hobbies associated with id "2"
3 TV
3 movie
. . .
Requirement:
1. for all combinations of (age, gender), return how many users have the
corresponding age and gender.
This can be achieved by
"select age, gender, count(*)
from Users
group by 1, 2;"
2. for all combinations of (age, hobby), return how many users have the
corresponding age and hobby.
Theoretically, this can be achieved by
select age, hobbies.hobby, count(*)
from Users join hobbies on Users.hobby_id = hobbies.hobby_id
group by 1,2;
But the problem is, the join is just too expensive:
(1) the Users table has 200k rows
(2) each hobby_id may map to 200 hobbies
thus the join will result in 200k * 200 = 40 million rows and causes
connection timeout.
It looks like we cannot use one single query to return all the results, and
have to break down the query into smaller ones?
d***e
发帖数: 793
2
Can you do this?
select a.age, hobbies.hobby, a.ct
(
select age,hobby_id, count(*) ct
from Users
group by 1,2;
)a
join hobbies
on a.hobby_id = hobbies.hobby_id
g*********n
发帖数: 43
3
Good point. The subquery
select age,hobby_id, count(*) ct
from Users
group by 1,2;
can compress table Users into a, then we join "a" and "hobbies".
This indeed helps. But the thing is, even after the compression, table a
still has 25k rows:), and the join is still 25k * 200 = 5 million rows.

【在 d***e 的大作中提到】
: Can you do this?
: select a.age, hobbies.hobby, a.ct
: (
: select age,hobby_id, count(*) ct
: from Users
: group by 1,2;
: )a
: join hobbies
: on a.hobby_id = hobbies.hobby_id

i*****w
发帖数: 75
4
I am a bit confused by the question:
If multiple hobbies can share the same hobby_id, then, for name A, it's
hobby_id =1 in the user table, does that mean person A has all the hobbies
defined in the hobby table with hobby_id =1? If so, then person A and B have
exactly the same hobbies?
Based on your design, then when you join your user table and the hobby table
by hobby_id, if one hobby_id has n hobbies, then for a given user, you will
have n rows of data, which caused the CROSS JOIN effect.
Since you mentioned that one hobby can belong to multiple hobby_id, so it
makes me feel like that you actually wanted to find Hobby Groups. For
example, Person A and B share the same hobbies of swimming, pingpong and
soccer, so you may try to store the data like below:
User table:
User Age Gender HobbyGroupID
A 20 M 100
B 45 F 100
C 52 M 101
Then have a HobbyGroup table:
HobbyGroupID HobbyGroupTypeDes
100 Hobby Group 100
101 Hobby Group 101
and a Hobby table:
HobbyID HobbyDescription
1 Swimming
2 Pingpong
3 Soccer
4 Travel
Then you can have a bridge table:
HobbyGroupID HobbyID
100 1
100 2
100 3
101 1
101 3
Note that HobbyID 1 and 3 are in multiple HobbyGroups in the previous bridge
table. Doing this way, you can group by in the user table, then make the
calculations in the bridge table, and finally put these two together.
Just my 2 cents.

of

【在 g*********n 的大作中提到】
: Two tables, the first one "Users" saves the user name and some properties of
: the user, age, gender, . . and the last column is "hobbies", to save the
: hobbies for this particular user. Since hobbies can be quite diverse, in
: this table we put an id of the hobbies.
: name age gender hobby_id
: A 20 M 1
: B 45 F 1
: C 52 M 2
: . . .
: And we define another table to map this id into real hobbies:

p********n
发帖数: 11
5
HobbyGroup is kind of no much use. Why not simply create a user table and a
hobby table, and create a (mapping) table to link a user with hobbies?
User
Hobby
UserHobby
P********8
发帖数: 12
6
discretizing age might help, such as 21-25,26-30.....
anyway it still depends on what's the business requirements.

【在 g*********n 的大作中提到】
: Good point. The subquery
: select age,hobby_id, count(*) ct
: from Users
: group by 1,2;
: can compress table Users into a, then we join "a" and "hobbies".
: This indeed helps. But the thing is, even after the compression, table a
: still has 25k rows:), and the join is still 25k * 200 = 5 million rows.

1 (共1页)
进入Database版参与讨论
相关主题
what is optimizer statistics in Oracle?问一个linq to sql编程问题
a question for JDBCgenerate unique integer ID from columns in SQL table (转载
about joinsBJ,那个COLUMNSTORE INDEX是不是BITMAP INDEX啊
Question 3: merging data, getting rid of dup?请教个VBA问题
我一直以为beijing是个mmMySQL Question
sql面试题1mysql challenge
用SSIS EXPORT 到 EXCEL 2010 有2000个COLUMN,可能吗Re: question about CAST and Multiset
问一个queryORA0-01422
相关话题的讨论汇总
话题: hobby话题: hobbies话题: users话题: age话题: table