Home >Database >Oracle >Summarize and organize common methods for removing duplicate data from Oracle database

Summarize and organize common methods for removing duplicate data from Oracle database

WBOY
WBOYforward
2022-08-22 17:59:303363browse

This article brings you relevant knowledge about Oracle, which mainly introduces the duplicate data in the table that is often cleared during data cleaning. So how to deal with it in Oracle? Let’s take a look at it together, I hope it will be helpful to everyone.

Summarize and organize common methods for removing duplicate data from Oracle database

Recommended tutorial: "Oracle Video Tutorial"

Create test data

create table nayi224_180824(col_1 varchar2(10), col_2 varchar2(10), col_3 varchar2(10));
insert into nayi224_180824
select 1, 2, 3 from dual union all
select 1, 2, 3 from dual union all
select 5, 2, 3 from dual union all
select 10, 20, 30 from dual ;
commit;
select*from nayi224_180824;
COL_1 COL_2 COL_3
1 2 3
1 2 3
5 2 3
10 20 30

for the specified Column, check the result set after deduplication

distinct

select distinct t1.* from nayi224_180824 t1;
##COL_1COL_2COL_3 102030123523
The method is very limited because it can only deduplicate all query columns. If I want to deduplicate col_2 and col3, then my result set can only have col_2 and col_3 columns, but not col_1.

select distinct t1.col_2, col_3 from nayi224_180824 t1

COL_2COL_3##220But it is also the simplest and easiest way to understand.
3
30

row_number()

select *
  from (select t1.*,
               row_number() over(partition by t1.col_2, t1.col_3 order by 1) rn
          from nayi224_180824 t1) t1
 where t1.rn = 1
;

COL_1110 ##It’s a lot more troublesome to write, but it has greater flexibility .
COL_2 COL_3 RN
2 3 1
20 30 1
For the specified column, find all duplicate rows

count having

select *
  from nayi224_180824 t
 where (t.col_2, t.col_3) in (select t1.col_2, t1.col_3
                                from nayi224_180824 t1
                               group by t1.col_2, t1.col_3
                              having count(1) > 1)

COL_1COL_2COL_3123 123523If you need to check the table twice, the efficiency will be relatively low. Not recommended.
count over

select *
  from (select t1.*,
               count(1) over(partition by t1.col_2, t1.col_3) rn
          from nayi224_180824 t1) t1
 where t1.rn > 1
;

COL_1COL_2COL_3RN123312335233#You only need to check the table once, recommended.
Delete all duplicate rows

delete from nayi224_180824 t
 where t.rowid in (
                   select rid
                     from (select t1.rowid rid,
                                   count(1) over(partition by t1.col_2, t1.col_3) rn
                              from nayi224_180824 t1) t1
                    where t1.rn > 1);

The above statement is slightly modified.

Delete duplicate data and retain one

Analytical function method

delete from nayi224_180824 t
 where t.rowid in (select rid
                     from (select t1.rowid rid,
                                  row_number() over(partition by t1.col_2, t1.col_3 order by 1) rn
                             from nayi224_180824 t1) t1
                    where t1.rn > 1);

has the consistent high flexibility of analytical functions. You can do whatever you want with the grouping and change the orderby clause to achieve requirements like "retain the maximum id".

group by

delete from nayi224_180824 t
 where t.rowid not in
       (select max(rowid) from nayi224_180824 t1 group by t1.col_2, t1.col_3);

Sacrifice some flexibility in exchange for higher efficiency.

Recommended tutorial: "

Oracle Video Tutorial

"

The above is the detailed content of Summarize and organize common methods for removing duplicate data from Oracle database. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete