怎样快速得到两个表的交集 - Database版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - 怎样快速得到两个表的交集

相关主题
● 请问个join的问题	● 新手请教SQL 语法问题- alias 和 join
● 一个有关查询的语句	● MySQL语句请教
● How to Import a Datatable as fast as possible?	● SQL question...
● SQL copy a table into a new table and add a new column	● 请问这个query怎么做
● 请教大虾问题哈，包子谢哈	● 问个JOIN的问题
● mysql 问题 (转载)	● 请教set和select 的区别
● how to write this query	● 菜鸟问题，急
● SQL combine two tables into one table and add a new column	● SQL 查询已经解决.谢谢Modeler,mirthc,cheungche

相关话题的讨论汇总
话题: select话题: index话题: table2话题: table1话题: join

进入Database版参与讨论

1

(共1页)

x****e 发帖数: 55	1 我有两个表，都只有一个属性 “ID”，每张表差不多都有5万条记录比如 ID 10 13 300 100 12 1001 。。。另外一个表也一样，只是记录的数据可能不同现在要求2个表的交集，可以这么做 select * from table1, table2 where table1.ID = table2.ID; 但是因为表比较大，上面这个命令费时间太多(因为对所有记录两两比较， O(n^2)) 按理说如果两个表先进行排序，后在求交集的话，时间复杂度只是线性的(O(2n)) 但是我不知道怎么用SQL语言实现，请各位大牛帮忙多谢
c*****d 发帖数: 6045	2 select * from table1 INTERSECT select * from table2
w*r 发帖数: 2421	3 if both table have indexex enabled on ID column, you can simply do SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A.ID = B.ID) most of databases should handle this situation by direct index hit/index join automatically... BTW, 50k records are tiny according to today's hardware/ software platform. this type of join should never cause any problem if you have proper index schema enabled on the tables. 【在 x****e 的大作中提到】 : 我有两个表，都只有一个属性 “ID”，每张表差不多都有5万条记录 : 比如 : ID : 10 : 13 : 300 : 100 : 12 : 1001 : 。。。
x****e 发帖数: 55	4 我在自己的PC上用MYSQL做了一下测试，对于都是5万条记录的两张表 table1, table2 第一个方法： select * from table1, table2 where table1.ID = table2.ID; 大约需要600秒第二个方法： create index table1_index on table1 (id); create index table2_index on table2 (id); SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A .ID = B.ID) 大约需要350秒的确快了第三个方法：用JAVA编程，随机生成两个size都是5万的array 先分别对两个array排序（从小到大），然后从小比到大，时间复杂度是2×50000 最终运算只需要不到1秒！！！为什么数据库的效率会比JAVA的程序慢那么多？哪位兄台能否解释一下？多谢
I******e 发帖数: 101	5 You basically need sort merge join or hash join to do this fast. But, MySQL does not support any of these. Do you have other databases? After you create index, can you do: explain select * from table1, table2 where table1.ID = table2.ID; and tell us the resule. BTW, if you output a lot of records, to measure performance, you should do: select count() from table1, table2 where table1.ID = table2.ID; (A 【在 x**e 的大作中提到】 : 我在自己的PC上用MYSQL做了一下测试， : 对于都是5万条记录的两张表 table1, table2 : 第一个方法： : select from table1, table2 where table1.ID = table2.ID; : 大约需要600秒 : 第二个方法： : create index table1_index on table1 (id); : create index table2_index on table2 (id); : SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A : .ID = B.ID)
x****e 发帖数: 55	6 MySQL sort merge join正是我想要的，是否只有ORACLE才有这个命令？结果是： id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table1 index table1_index table1_index 9 47506 Using index 1 SIMPLE table2 ref table2_index table2_index 9 codis. table1.id 1 Using where; Using index 似乎采用这个命令，速度非常快，也是不到1秒这个跟 create table jointable select table1.id from table1, table2 where table1.ID = table2.ID; 好像一样的快但select table1.i 【在 I*****e 的大作中提到】 : You basically need sort merge join or hash join to do this fast. But, MySQL : does not support any of these. Do you have other databases? : After you create index, can you do: : explain select from table1, table2 where table1.ID = table2.ID; : and tell us the resule. : BTW, if you output a lot of records, to measure performance, you should do: : select count(*) from table1, table2 where table1.ID = table2.ID; : : (A
I******e 发帖数: 101	7 All major databases all have it. But hash join is much faster in most cases. From the explain, it looks like that MySQL is still using nested index join. The slowness should be in the output. If you use Python or Java API to retrieve rows, it will be much faster.
x****e 发帖数: 55	8 非常感谢！
w*r 发帖数: 2421	9 mysql...well, free software usually need lot tuning effort. I believe mysql need some tuning to make the db engine choose hash join on index. Other commercial db software should use merge or hash join for the index automatically in your scenario. get away from mysql if you do not want to waste your time to fine tune the RDBMS behavior. (A 【在 x***e 的大作中提到】 : 我在自己的PC上用MYSQL做了一下测试， : 对于都是5万条记录的两张表 table1, table2 : 第一个方法： : select from table1, table2 where table1.ID = table2.ID; : 大约需要600秒 : 第二个方法： : create index table1_index on table1 (id); : create index table2_index on table2 (id); : SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A : .ID = B.ID)
I******e 发帖数: 101	10 This is a hard lesson to me: Mysql does not support hash join or sort merge join. In most system, Mysql is always I/O bounded.

1

(共1页)

进入Database版参与讨论

相关主题
● SQL 查询已经解决.谢谢Modeler,mirthc,cheungche	● 请教大虾问题哈，包子谢哈
● 请教高手，包子谢	● mysql 问题 (转载)
● 请求SQL语句	● how to write this query
● 问个 sp_send_dbmail 的问题	● SQL combine two tables into one table and add a new column
● 请问个join的问题	● 新手请教SQL 语法问题- alias 和 join
● 一个有关查询的语句	● MySQL语句请教
● How to Import a Datatable as fast as possible?	● SQL question...
● SQL copy a table into a new table and add a new column	● 请问这个query怎么做

相关话题的讨论汇总
话题: select话题: index话题: table2话题: table1话题: join

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)