由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - 怎样快速得到两个表的交集
相关主题
请问个join的问题新手请教SQL 语法问题- alias 和 join
一个有关查询的语句MySQL语句请教
How to Import a Datatable as fast as possible?SQL question...
SQL copy a table into a new table and add a new column请问这个query怎么做
请教大虾问题哈,包子谢哈问个JOIN的问题
mysql 问题 (转载)请教set和select 的区别
how to write this query菜鸟问题,急
SQL combine two tables into one table and add a new columnSQL 查询已经解决.谢谢Modeler,mirthc,cheungche
相关话题的讨论汇总
话题: select话题: index话题: table2话题: table1话题: join
进入Database版参与讨论
1 (共1页)
x****e
发帖数: 55
1
我有两个表,都只有一个属性 “ID”,每张表差不多都有5万条记录
比如
ID
10
13
300
100
12
1001
。。。
另外一个表也一样,只是记录的数据可能不同
现在要求2个表的交集,可以这么做
select * from table1, table2 where table1.ID = table2.ID;
但是因为表比较大,上面这个命令费时间太多(因为对所有记录两两比较, O(n^2))
按理说如果两个表先进行排序,后在求交集的话,时间复杂度只是线性的(O(2n))
但是我不知道怎么用SQL语言实现,请各位大牛帮忙
多谢
c*****d
发帖数: 6045
2
select * from table1
INTERSECT
select * from table2
w*r
发帖数: 2421
3
if both table have indexex enabled on ID column, you can simply do
SELECT * FROM
(SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B
ON (A.ID = B.ID)
most of databases should handle this situation by direct index hit/index
join
automatically... BTW, 50k records are tiny according to today's hardware/
software platform. this type of join should never cause any problem if you
have proper index schema enabled on the tables.

【在 x****e 的大作中提到】
: 我有两个表,都只有一个属性 “ID”,每张表差不多都有5万条记录
: 比如
: ID
: 10
: 13
: 300
: 100
: 12
: 1001
: 。。。

x****e
发帖数: 55
4
我在自己的PC上用MYSQL做了一下测试,
对于都是5万条记录的两张表 table1, table2
第一个方法:
select * from table1, table2 where table1.ID = table2.ID;
大约需要600秒
第二个方法:
create index table1_index on table1 (id);
create index table2_index on table2 (id);
SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A
.ID = B.ID)
大约需要350秒
的确快了
第三个方法:
用JAVA编程,随机生成两个size都是5万的array
先分别对两个array排序(从小到大),然后从小比到大,时间复杂度是2×50000
最终运算只需要不到1秒 !!!
为什么数据库的效率会比JAVA的程序慢那么多?
哪位兄台能否解释一下?
多谢
I******e
发帖数: 101
5
You basically need sort merge join or hash join to do this fast. But, MySQL
does not support any of these. Do you have other databases?
After you create index, can you do:
explain select * from table1, table2 where table1.ID = table2.ID;
and tell us the resule.
BTW, if you output a lot of records, to measure performance, you should do:
select count(*) from table1, table2 where table1.ID = table2.ID;

(A

【在 x****e 的大作中提到】
: 我在自己的PC上用MYSQL做了一下测试,
: 对于都是5万条记录的两张表 table1, table2
: 第一个方法:
: select * from table1, table2 where table1.ID = table2.ID;
: 大约需要600秒
: 第二个方法:
: create index table1_index on table1 (id);
: create index table2_index on table2 (id);
: SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A
: .ID = B.ID)

x****e
发帖数: 55
6

MySQL
sort merge join正是我想要的,是否只有ORACLE才有这个命令?
结果是:
id select_type table type possible_keys key key_len ref
rows Extra
1 SIMPLE table1 index table1_index table1_index 9
47506 Using index
1 SIMPLE table2 ref table2_index table2_index 9 codis.
table1.id 1 Using where; Using index
似乎采用这个命令,速度非常快,也是不到1秒
这个跟 create table jointable select table1.id from table1, table2 where
table1.ID = table2.ID; 好像一样的快
但select table1.i

【在 I******e 的大作中提到】
: You basically need sort merge join or hash join to do this fast. But, MySQL
: does not support any of these. Do you have other databases?
: After you create index, can you do:
: explain select * from table1, table2 where table1.ID = table2.ID;
: and tell us the resule.
: BTW, if you output a lot of records, to measure performance, you should do:
: select count(*) from table1, table2 where table1.ID = table2.ID;
:
: (A

I******e
发帖数: 101
7
All major databases all have it. But hash join is much faster in most cases.
From the explain, it looks like that MySQL is still using nested index join.
The slowness should be in the output. If you use Python or Java API to
retrieve rows, it will be much faster.
x****e
发帖数: 55
8
非常感谢!
w*r
发帖数: 2421
9
mysql...well, free software usually need lot tuning effort. I believe mysql
need some tuning to make the db engine choose hash join on index. Other
commercial db software should use merge or hash join for the index
automatically in your scenario. get away from mysql if you do not want to
waste your time to fine tune the RDBMS behavior.

(A

【在 x****e 的大作中提到】
: 我在自己的PC上用MYSQL做了一下测试,
: 对于都是5万条记录的两张表 table1, table2
: 第一个方法:
: select * from table1, table2 where table1.ID = table2.ID;
: 大约需要600秒
: 第二个方法:
: create index table1_index on table1 (id);
: create index table2_index on table2 (id);
: SELECT * FROM (SELECT ID FROM TB1) A INNER JOIN (SELECT ID FROM TB2) B ON (A
: .ID = B.ID)

I******e
发帖数: 101
10
This is a hard lesson to me: Mysql does not support hash join or sort merge
join. In most system, Mysql is always I/O bounded.
1 (共1页)
进入Database版参与讨论
相关主题
SQL 查询已经解决.谢谢Modeler,mirthc,cheungche请教大虾问题哈,包子谢哈
请教高手,包子谢mysql 问题 (转载)
请求SQL语句how to write this query
问个 sp_send_dbmail 的问题SQL combine two tables into one table and add a new column
请问个join的问题新手请教SQL 语法问题- alias 和 join
一个有关查询的语句MySQL语句请教
How to Import a Datatable as fast as possible?SQL question...
SQL copy a table into a new table and add a new column请问这个query怎么做
相关话题的讨论汇总
话题: select话题: index话题: table2话题: table1话题: join