由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - 怎么去除duplicates
相关主题
Remove duplicate from oracle tableSQL help.
一道面试题,求助Re: How to find a duplicate record in Ac
In toad, oracle rowid 问题
菜鸟弱问:你们说的DBA和美国老土说的DBA是一回事吗?新手一问,关于delete和truncate table
这个新功能我喜欢请教一个SQL Server的面试题
五月份免费SQL培训时间 (转载)Another one
这个sql怎么写?Stored Procedure?
How to find all duplicate record in SQL?truncate和delete在ORACLE里有什么区别?
相关话题的讨论汇总
话题: rowid话题: delete话题: value话题: tn话题: table
进入Database版参与讨论
1 (共1页)
d********y
发帖数: 2114
1
假如有一个表,没有primary key。
属性例如,firstName, lastName, major, dateOfBirth。
怎么去除这个表里面重复的?
除了select distinct然后存到新表里面。
M*******a
发帖数: 294
2
GROUP BY那些属性
s****y
发帖数: 581
3
if your DB is Oracle, suppose your table name is table_name:
delete from table_name tn_1
where exists(select *
from table_name tn_2
where (
(tn_2.first_name = tn_1.first_name)
or
(tn_2.first_name is null and tn_1.first_name is null)
)
and
(
(tn_2.last_name = tn_1.last_name)
or
(tn_2.last_name is null and tn_1.last_name is null)
)
and
(
(tn_2.major = tn_1.major)
or
(tn_2.major is null and tn_1.major is null)
)
and
(
(tn_2.dateOfBirth = tn_1.dateOfBirth)
or
(tn_2.dateOfBirth is null and tn_1.dateOfBirth is null)
)
and tn_2.ROWID < tn_1.ROWID
)
d********y
发帖数: 2114
4
这个rowID是table本身的属性么?
删记录会立刻更新这个rowID么?

【在 s****y 的大作中提到】
: if your DB is Oracle, suppose your table name is table_name:
: delete from table_name tn_1
: where exists(select *
: from table_name tn_2
: where (
: (tn_2.first_name = tn_1.first_name)
: or
: (tn_2.first_name is null and tn_1.first_name is null)
: )
: and

s****y
发帖数: 581
5
rowid is the physical address which is unique and can't be changed until you
do compress table or export/import.

【在 d********y 的大作中提到】
: 这个rowID是table本身的属性么?
: 删记录会立刻更新这个rowID么?

B*****g
发帖数: 34098
6
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTIO

you

【在 s****y 的大作中提到】
: rowid is the physical address which is unique and can't be changed until you
: do compress table or export/import.

B*****g
发帖数: 34098
7
第一件事,add pk

【在 d********y 的大作中提到】
: 假如有一个表,没有primary key。
: 属性例如,firstName, lastName, major, dateOfBirth。
: 怎么去除这个表里面重复的?
: 除了select distinct然后存到新表里面。

s****y
发帖数: 581
8
In LZ's question, rowid is safe enough to do delete duplicates based on the
uniqueness of the rowid at the same point of time.
In asktom scenario, user try to do something based on assumption that rowid
has never changed historically.Apparently, it is not safe since so many
things can lead to row movement/reoccupied.

【在 B*****g 的大作中提到】
: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTIO
:
: you

B*****g
发帖数: 34098
9
假设有一对dup,一个rowid是1,一个是2.
delete先执行到rwoid=1,这时候不delete。
然后经过10分钟执行到了rowid=2,此时rowid=1的那个record由于某种愿因变成了
rowid=3,这个时候会delete吗?

the
rowid

【在 s****y 的大作中提到】
: In LZ's question, rowid is safe enough to do delete duplicates based on the
: uniqueness of the rowid at the same point of time.
: In asktom scenario, user try to do something based on assumption that rowid
: has never changed historically.Apparently, it is not safe since so many
: things can lead to row movement/reoccupied.

s****y
发帖数: 581
10
if the statement has not been commited, the row with rowid=1 should be
locked how can it be changed?

【在 B*****g 的大作中提到】
: 假设有一对dup,一个rowid是1,一个是2.
: delete先执行到rwoid=1,这时候不delete。
: 然后经过10分钟执行到了rowid=2,此时rowid=1的那个record由于某种愿因变成了
: rowid=3,这个时候会delete吗?
:
: the
: rowid

相关主题
五月份免费SQL培训时间 (转载)SQL help.
这个sql怎么写?Re: How to find a duplicate record in Ac
How to find all duplicate record in SQL?oracle rowid 问题
进入Database版参与讨论
B*****g
发帖数: 34098
11
rowid=1又没有被delete,不会被lock。其实你关于rowid的解释挺好,要是需要更深了
解的同学可以看tom的link,写的很清楚,还有例子。至于楼主的问题,最简单就是加
一个pk。这些rowid讨论和问题其实不搭边,说的太复杂反而容易把新手搞糊涂。
***********************************
--Test for delete lock
CREATE TABLE test_delete (ID NUMBER(1), VAL VARCHAR2(1));
INSERT INTO test_delete VALUES(1, 'A');
INSERT INTO test_delete VALUES(2, 'A');
COMMIT;
SELECT * FROM test_delete;
DELETE FROM test_delete t1
WHERE EXISTS (SELECT * FROM test_delete t2 WHERE t2.val = t1.val AND t1.
rowid > t2.rowid);
-- No commit here
SELECT * FROM test_delete;
DELETE FROM test_delete WHERE ID = 1;
SELECT * FROM test_delete;
******************************************

【在 s****y 的大作中提到】
: if the statement has not been commited, the row with rowid=1 should be
: locked how can it be changed?

y****w
发帖数: 3747
12
正规解法不就是delete from (...) t where t.rn > 1么。
如果重复很多,到xx临界点,原地delete就不如来回load两遍数据了。

【在 d********y 的大作中提到】
: 假如有一个表,没有primary key。
: 属性例如,firstName, lastName, major, dateOfBirth。
: 怎么去除这个表里面重复的?
: 除了select distinct然后存到新表里面。

B*****g
发帖数: 34098
13
至少在oracle里不行,必须有key,你在DB2里试试

【在 y****w 的大作中提到】
: 正规解法不就是delete from (...) t where t.rn > 1么。
: 如果重复很多,到xx临界点,原地delete就不如来回load两遍数据了。

y****w
发帖数: 3747
14
db2可以。
$ db2 "select id, value, rownumber()over(partition by id, value) from t1"
ID VALUE 3
----------- ----------- --------------------
1 1 1
2 1 1
2 1 2
3 1 1
3 1 2
3 1 3
6 record(s) selected.
$ db2 "delete from (select id, value, rownumber()over(partition by id, value
) as rn from t1) tt where rn > 1"
DB20000I The SQL command completed successfully.
$ db2 "select * from t1"
ID VALUE
----------- -----------
1 1
2 1
3 1
3 record(s) selected.

【在 B*****g 的大作中提到】
: 至少在oracle里不行,必须有key,你在DB2里试试
v*****r
发帖数: 1119
15
作为 tom 的 fan, 最喜欢他解释问题时的言简意赅,但发现 tom 偶尔也会迷惑人。他
这个帖子开始解释的很好,rowid 会变,什么情况下会变,然后给出一个例子:
“It is perfectly safe to use the rowid in ALL CASES however, assuming you
combine it with
the primary key as well:
update t
set...
where rowid = :x and primary_key = :pk;

看到这就忍不住笑了,rowid = :x, 既然您老人家刚解释了rowid会变,为什么又给出
这么一个 bad example. This update query is assuming the rowid value is
stored somewhere before being passed to update statement, to protect the
update transaction you will have to lock the rows when fetching their rowids
(since rowid can change), to avoid the locking, which is necessary to
safeguard the update, Tom then added primary_key to the update.
Tom 这个 update query also made this assumption: If DBA did something (table
re-org, export/import) that changed the rowid values of rows in table t, I
want my update statement fail silently.
Is that really what you want?
In the real world, I haven't seen any needs to store the rowid value (either
physically in the table or temporary stored in your code), the only
exception is "exceptions" handling, the old exceptions table and error log
table you created using dbms_errlog are storing rowid, but they are stored
temporarily for troubleshooting. In your code, you almost never need to and
should avoid fetching/storing rowid values unless you are as clear as Tom
who always knows the consequences even when giving a bad update example.
回到楼主的question: 怎么去除duplicates in table, 我最喜欢的方法是:
delete from
where rowid not in ( select min(rowid)
from

group by key_column1,key_column2, key_column3...);
@Beijing,
Not sure I understand the purpose of your test case: Test for delete lock.
It seems you think rows are not locked when just using rowid values in where
clause. They are locked on DML before committing regardless using rowid or
not.

【在 B*****g 的大作中提到】
: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTIO
:
: you

B*****g
发帖数: 34098
16
加一个完全dup的record试试,比如
ID VALUE 3
----------- ----------- --------------------
1 1 1
1 1 1
1 1 1

【在 y****w 的大作中提到】
: db2可以。
: $ db2 "select id, value, rownumber()over(partition by id, value) from t1"
: ID VALUE 3
: ----------- ----------- --------------------
: 1 1 1
: 2 1 1
: 2 1 2
: 3 1 1
: 3 1 2
: 3 1 3

B*****g
发帖数: 34098
17
In my test
r1 rowid = 1
r2 rowid = 2
是重复的,只须保留一个。
假设delete先到r1,一看没有比rowid=1小的,好,保留r1。这时候r1没有被lock。然
后经过漫长的时间终于到了r2,现在还能不能保证r1的rowid是1?
如果能,是什么使得r1的rowid不变?
如果不能保证,r1的rowid会不会变成3以致使r2也会被保留?或者这时候的r2的rowid会
不会已经变成0了?
我的例子也不一定就对,只是理论上一种假设,还没有想到怎么测试。大家有可能看不
懂我的例子是因为大家盯在delete上,没有注意到ckeck rowid大小上。

【在 v*****r 的大作中提到】
: 作为 tom 的 fan, 最喜欢他解释问题时的言简意赅,但发现 tom 偶尔也会迷惑人。他
: 这个帖子开始解释的很好,rowid 会变,什么情况下会变,然后给出一个例子:
: “It is perfectly safe to use the rowid in ALL CASES however, assuming you
: combine it with
: the primary key as well:
: update t
: set...
: where rowid = :x and primary_key = :pk;
: ”
: 看到这就忍不住笑了,rowid = :x, 既然您老人家刚解释了rowid会变,为什么又给出

y****9
发帖数: 144
18
I like analytical function solution, repeat in Oracle:
SQL>
SQL>select id,value, row_number() over (partition by id, value order by
rowid) rn from t;
ID VALUE RN
---------- ---------- ----------
1 1 1
2 1 1
2 1 2
3 1 1
3 1 2
3 1 3
6 rows selected.
SQL>
SQL>
SQL>
SQL>delete from t where rowid in (
2 select rowid from
3 (select rowid, row_number() over (partition by id, value order by
rowid) rn from t)
4 where rn !=1
5 );
3 rows deleted.
SQL>select * from t;
ID VALUE
---------- ----------
1 1
2 1
3 1
SQL>

【在 y****w 的大作中提到】
: db2可以。
: $ db2 "select id, value, rownumber()over(partition by id, value) from t1"
: ID VALUE 3
: ----------- ----------- --------------------
: 1 1 1
: 2 1 1
: 2 1 2
: 3 1 1
: 3 1 2
: 3 1 3

v*****r
发帖数: 1119
19
可以保证,in your case, before committing the delete, test_delete table will
be locked against any DDL change. If you try anything that might change
rowid, you will be greeted with error message:
ORA-00054: resource busy and acquire with NOWAIT specified
B*****g
发帖数: 34098
20
我的例子很清楚呀,没有commit,照样可以把id=1 delete 了。

will

【在 v*****r 的大作中提到】
: 可以保证,in your case, before committing the delete, test_delete table will
: be locked against any DDL change. If you try anything that might change
: rowid, you will be greeted with error message:
: ORA-00054: resource busy and acquire with NOWAIT specified

相关主题
新手一问,关于delete和truncate tableStored Procedure?
请教一个SQL Server的面试题truncate和delete在ORACLE里有什么区别?
Another oneOracle char AND varchar2 datatype question.
进入Database版参与讨论
y****9
发帖数: 144
21

I am not necessarily understanding what you said as I did not go through all
the details. But regarding what you said above few sentence - if at the
time you issue the query (Q1), you saw r1 with rowid=1, then no mather who
updated r1 rowid= whatever value later, commit or not commit, for the Q1's
sake, rowid=1 is always true. Is this a common understanding?

【在 B*****g 的大作中提到】
: In my test
: r1 rowid = 1
: r2 rowid = 2
: 是重复的,只须保留一个。
: 假设delete先到r1,一看没有比rowid=1小的,好,保留r1。这时候r1没有被lock。然
: 后经过漫长的时间终于到了r2,现在还能不能保证r1的rowid是1?
: 如果能,是什么使得r1的rowid不变?
: 如果不能保证,r1的rowid会不会变成3以致使r2也会被保留?或者这时候的r2的rowid会
: 不会已经变成0了?
: 我的例子也不一定就对,只是理论上一种假设,还没有想到怎么测试。大家有可能看不

y****9
发帖数: 144
22
BTW if there are DML against a table, there will be a TM lock on the table,
I cannot think of a situation that rowid of that table can be changed.

all

【在 y****9 的大作中提到】
:
: I am not necessarily understanding what you said as I did not go through all
: the details. But regarding what you said above few sentence - if at the
: time you issue the query (Q1), you saw r1 with rowid=1, then no mather who
: updated r1 rowid= whatever value later, commit or not commit, for the Q1's
: sake, rowid=1 is always true. Is this a common understanding?

B*****g
发帖数: 34098
23
你这个有道理,我就是在想怎么证明这一点是正确还是错误

all

【在 y****9 的大作中提到】
: BTW if there are DML against a table, there will be a TM lock on the table,
: I cannot think of a situation that rowid of that table can be changed.
:
: all

y****9
发帖数: 144
24

rowid is a special case, not good in this case to demonstrate Oracle
consistent read feature.
if you say at the time of query Q1: row1 with col1=1; then at later time
some other session update the row1 with col1=10, no matter commited or no
commited. For Q1, row1 always col1=1, this is achieved by reconstructing a
CR block from UNDO.

【在 B*****g 的大作中提到】
: 你这个有道理,我就是在想怎么证明这一点是正确还是错误
:
: all

v*****r
发帖数: 1119
25
That is expected behavior, because the first delete didn't lock row id=1.
Let's go though two deletes in your case:
1. First delete is randomly picking up row to delete, it happens to pickup
id=2 because of your inserting sequence. In terms of all the locks used with
the first delete using rowid, it is same as "delete from test_case where id
=2":
id=2 row is locked against concurrent write, test_delete table is also
locked in shared-exclusive mode to against DDL change.
2. second delete runs though okay since id=1 row is not locked in first
delete (the first delete locked id=2 row by chance)
The problem with rowid is rowid can change in the following situations
1. DDL to the table.
2. DML doing concurrent write that row causing row migration.
Within one transaction, rowid for a row is guaranteed to not being able to
be changed, so it is safe to use whether caching it or not in that single
transaction. But if you cache the rowid value (in memory or table) and use
the cached value across the transactions, you might delete wrong row.

【在 B*****g 的大作中提到】
: 我的例子很清楚呀,没有commit,照样可以把id=1 delete 了。
:
: will

y****w
发帖数: 3747
26
"3" 是那个rownumber. 在这里只能是1,2,3. 如果你要3列的表,每个都是1,那个子查
询就是
1 1 1 1
1 1 1 2
1 1 1 3
了。
我写那个delete时候先写了下sub-query,然后直接贴上来了,然后就懒得改了。
很奇怪为什么oracle不支持,非得要索引。这种看起来应该是很通用的解决方案。 直
接这么用不行么?为什么非得去用rowid不用rownnumber?直接这么跑报什么错误?

【在 B*****g 的大作中提到】
: 加一个完全dup的record试试,比如
: ID VALUE 3
: ----------- ----------- --------------------
: 1 1 1
: 1 1 1
: 1 1 1

B*****g
发帖数: 34098
27
I know this for normal columns, just not sure about rowid. Did one test,
seems in same SQL (no any SP) works. So for one simple sql your guys are
right, rowid can not be changed which is same as normal column.
CREATE TABLE test_delete (ID NUMBER(1), VAL VARCHAR2(1));
INSERT INTO test_delete VALUES(1, 'A');
INSERT INTO test_delete VALUES(2, 'B');
INSERT INTO test_delete VALUES(3, 'C');
COMMIT;
CREATE OR REPLACE FUNCTION LOCK_TEST(vID NUMBER) RETURN VARCHAR2 IS
tmpVar VARCHAR2(100);
tmpVar1 VARCHAR2(100);
tmpVar2 VARCHAR2(100);
BEGIN
IF vID != 1 THEN
FOR I IN 1 .. 100000 LOOP
-- use some time
SELECT rowid
INTO tmpVar
FROM test_delete
WHERE ID = vID;
END LOOP;
END IF;

SELECT ID||'--'||VAL||'--'||rowid
INTO tmpVar
FROM test_delete
WHERE ID = 1;
RETURN tmpVar;
EXCEPTION
WHEN OTHERS THEN
RAISE;
END LOCK_TEST;
/
--RUN sql:
SELECT ID ||'-'||rowid RID, LOCK_TEST(ID) FID, VAL,
(SELECT ID ||'-'||rowid FROM test_delete WHERE ID = 1) NID
FROM test_delete a
ORDER BY id
--During runing SQL, in another session do following
DELETE FROM test_delete WHERE ID = 1;
COMMIT;
--Results, NID not got changed but FID got changed
RID,FID,VAL,NID
'1-AAMpnhAAeAAAO7cAAA','1--A--AAMpnhAAeAAAO7cAAA','A','1-AAMpnhAAeAAAO7cAAA',
'2-AAMpnhAAeAAAO7eAAA',null,'B','1-AAMpnhAAeAAAO7cAAA',
'3-AAMpnhAAeAAAO7cAAB',null,'C','1-AAMpnhAAeAAAO7cAAA',

【在 y****9 的大作中提到】
:
: rowid is a special case, not good in this case to demonstrate Oracle
: consistent read feature.
: if you say at the time of query Q1: row1 with col1=1; then at later time
: some other session update the row1 with col1=10, no matter commited or no
: commited. For Q1, row1 always col1=1, this is achieved by reconstructing a
: CR block from UNDO.

v*****r
发帖数: 1119
28
ROWID is just a function that returns a row's physical location, which
includs: file location + block location + row_offset_within_a_block.
If a rows move to different physical location, its rowid changes, simple as
that.
During a transaction on a table, all rows' physical locations is protected
by locking. Even you only modify one row in a table, all rows' rowids are
guaranteed not to be changed since there is no way you can trigger row
moving once you have a DML hitting that table.
Across transactions, there is no protection against row from moving
physically. It is just not safe to cache a rowid and think the cached rowid
will always points to same physical location as time going on.

【在 B*****g 的大作中提到】
: I know this for normal columns, just not sure about rowid. Did one test,
: seems in same SQL (no any SP) works. So for one simple sql your guys are
: right, rowid can not be changed which is same as normal column.
: CREATE TABLE test_delete (ID NUMBER(1), VAL VARCHAR2(1));
: INSERT INTO test_delete VALUES(1, 'A');
: INSERT INTO test_delete VALUES(2, 'B');
: INSERT INTO test_delete VALUES(3, 'C');
: COMMIT;
: CREATE OR REPLACE FUNCTION LOCK_TEST(vID NUMBER) RETURN VARCHAR2 IS
: tmpVar VARCHAR2(100);

B*****g
发帖数: 34098
29
我一直confuse你说的transaction,因为我想着不到commit or rollback,不算一个tr
ansaction。我想你指的是指这个sql noly

as
rowid

【在 v*****r 的大作中提到】
: ROWID is just a function that returns a row's physical location, which
: includs: file location + block location + row_offset_within_a_block.
: If a rows move to different physical location, its rowid changes, simple as
: that.
: During a transaction on a table, all rows' physical locations is protected
: by locking. Even you only modify one row in a table, all rows' rowids are
: guaranteed not to be changed since there is no way you can trigger row
: moving once you have a DML hitting that table.
: Across transactions, there is no protection against row from moving
: physically. It is just not safe to cache a rowid and think the cached rowid

y****9
发帖数: 144
30
Don't agree. the point is ROWID also followd Oracle consistent reads
mechanism.
It is not saying ROWID of some rows of a table cannot be changed during soem
DML on the table.
If in session 1 I update a row, can't I move other rows around by delete/
insert?
相关主题
菜鸟急问ORACLE里FUNCTION返回ref cursor的问题一道面试题,求助
请教: SQL SUMIn toad,
Remove duplicate from oracle table菜鸟弱问:你们说的DBA和美国老土说的DBA是一回事吗?
进入Database版参与讨论
B*****g
发帖数: 34098
31
这个对,我最后一个test应该证明了。不过像我上面说的,他可能指的是per query。
我记得原来看过,就像我那个function里面就会出问题,不过现在忘了oracle是怎么处理这个部分
的。而且test是在9i下做的,等我有空到11g上测试一下。

protected
rowids
are
row
during
soem
delete/
192.76.]

【在 y****9 的大作中提到】
: Don't agree. the point is ROWID also followd Oracle consistent reads
: mechanism.
: It is not saying ROWID of some rows of a table cannot be changed during soem
: DML on the table.
: If in session 1 I update a row, can't I move other rows around by delete/
: insert?

m****d
发帖数: 372
32
以前处理过这个问题,用的也是analytical function,与此类似。
delete from $table_name where rowid in
(
select rowid from
(select rank() over (partition by $all_columns order by rowid) rank_n,
rowid as "rowid"
from $table_name
)
where rank_n > 1
)

【在 y****9 的大作中提到】
: I like analytical function solution, repeat in Oracle:
: SQL>
: SQL>select id,value, row_number() over (partition by id, value order by
: rowid) rn from t;
: ID VALUE RN
: ---------- ---------- ----------
: 1 1 1
: 2 1 1
: 2 1 2
: 3 1 1

v*****r
发帖数: 1119
33
That is right. That means rowids should not be cached for later reference
even within the same transaction, not just across transaction.

soem

【在 y****9 的大作中提到】
: Don't agree. the point is ROWID also followd Oracle consistent reads
: mechanism.
: It is not saying ROWID of some rows of a table cannot be changed during soem
: DML on the table.
: If in session 1 I update a row, can't I move other rows around by delete/
: insert?

a***y
发帖数: 2803
34
这是个什么表啊? 一个表肯定要有primary key,create table的时候,如果不想重复输
入一行数据,用unique(col1,col2,....)可以避免重复的数据被输入表里面.

【在 d********y 的大作中提到】
: 假如有一个表,没有primary key。
: 属性例如,firstName, lastName, major, dateOfBirth。
: 怎么去除这个表里面重复的?
: 除了select distinct然后存到新表里面。

a***y
发帖数: 2803
35
要看是ddl还是dml吧.

tr

【在 B*****g 的大作中提到】
: 我一直confuse你说的transaction,因为我想着不到commit or rollback,不算一个tr
: ansaction。我想你指的是指这个sql noly
:
: as
: rowid

B*****g
发帖数: 34098
36
DDL自动一个commit

【在 a***y 的大作中提到】
: 要看是ddl还是dml吧.
:
: tr

s****y
发帖数: 581
37
Not necessary.some association table does not necessary need PK which can
only play as an identifier.

【在 a***y 的大作中提到】
: 这是个什么表啊? 一个表肯定要有primary key,create table的时候,如果不想重复输
: 入一行数据,用unique(col1,col2,....)可以避免重复的数据被输入表里面.

1 (共1页)
进入Database版参与讨论
相关主题
truncate和delete在ORACLE里有什么区别?这个新功能我喜欢
Oracle char AND varchar2 datatype question.五月份免费SQL培训时间 (转载)
菜鸟急问ORACLE里FUNCTION返回ref cursor的问题这个sql怎么写?
请教: SQL SUMHow to find all duplicate record in SQL?
Remove duplicate from oracle tableSQL help.
一道面试题,求助Re: How to find a duplicate record in Ac
In toad, oracle rowid 问题
菜鸟弱问:你们说的DBA和美国老土说的DBA是一回事吗?新手一问,关于delete和truncate table
相关话题的讨论汇总
话题: rowid话题: delete话题: value话题: tn话题: table