Home  >  Article  >  Database  >  How to deduplicate data in Oracle

How to deduplicate data in Oracle

青灯夜游
青灯夜游Original
2023-01-04 14:42:2513505browse

Deduplication method: 1. Use distinct keyword to remove duplication, syntax "SELECT DISTINCT field name FROM table name;"; 2. Use window function row_number () over() to remove duplication; 3. Use "group by" clause to deduplicate, the syntax is "select field name from table name group by field name;"; 4. Use rowid to deduplicate pseudo columns.

How to deduplicate data in Oracle

The operating environment of this tutorial: Windows 7 system, Oracle 11g version, Dell G3 computer.

Business Scenario

Need to query certain data. Since three tables are required for related queries, the query results are as follows:

How to deduplicate data in Oracle
Original SQL statement

SELECT 
  D.ORDER_NUM AS "申请单号" ,
  D.CREATE_TIME ,
  D.EMP_NAME AS "申请人",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_wasteName')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "废料名称",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_units')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "单位",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_estimate')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "预估数量",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_stockRemoval')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "累计出库数量",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_receivingTime')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdCGYTX'
  ) AS "收购方收货时间",
  (SELECT extractvalue(t2.row_data,'/root/row/FI13_collectionTime')
  FROM dat_table_row t2
  WHERE d.document_id = t2.document_id
  AND t2.table_id     = 'dynamicRowsIdPTSJSKSJ'
  ) AS "实际收款时间"
FROM dat_document d,
  dat_table_row dtr
WHERE d.form_name       ='FI14'
AND d.document_id       =dtr.document_id
AND (D.DOCUMENT_STATUS != 'deleted'
OR D.DOCUMENT_STATUS   IS NULL )
  --AND TO_CHAR(d.create_time,'yyyy-MM-dd') BETWEEN '2020-01-01' AND '2021-03-26'
AND d.order_num = 'FI1420210708002' --FI1420210708002
ORDER BY d.CREATE_TIME DESC;

Method 1: distinct deduplication

SELECT DISTINCT can be used to filter the result set Duplicate rows, ensuring that the values ​​in the specified column or columns returned in the SELECT clause are unique.

The syntax of the DISTINCT statement is as follows:

SELECT DISTINCT column_1,
    column_2,
        ...
        FROM
    table_name;

Example:

SELECT 
  D.ORDER_NUM AS "申请单号" ,
  D.CREATE_TIME ,
  D.EMP_NAME AS "申请人",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_wasteName')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "废料名称",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_units')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "单位",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_estimate')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "预估数量",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_stockRemoval')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) AS "累计出库数量",
  (SELECT extractvalue(t1.row_data,'/root/row/FI13_receivingTime')
  FROM dat_table_row t1
  WHERE d.document_id = t1.document_id
  AND t1.table_id     = 'dynamicRowsIdCGYTX'
  ) AS "收购方收货时间",
  (SELECT extractvalue(t2.row_data,'/root/row/FI13_collectionTime')
  FROM dat_table_row t2
  WHERE d.document_id = t2.document_id
  AND t2.table_id     = 'dynamicRowsIdPTSJSKSJ'
  ) AS "实际收款时间"
FROM dat_document d,
  dat_table_row dtr
WHERE d.form_name       ='FI14'
AND d.document_id       =dtr.document_id
AND (D.DOCUMENT_STATUS != 'deleted'
OR D.DOCUMENT_STATUS   IS NULL )
  --AND TO_CHAR(d.create_time,'yyyy-MM-dd') BETWEEN '2020-01-01' AND '2021-03-26'
AND d.order_num = 'FI1420210708002' --FI1420210708002
ORDER BY d.CREATE_TIME DESC;

Note: DISTINCT must be followed by the ORDER BY field. Oracle first performs DISTINCT to remove duplicates, and then uses ORDER Sorted by BY. Therefore, if the field that needs to be sorted in ORDER BY is not in the field after distinct, an error will naturally be thrown.

The error message is as follows:

How to deduplicate data in Oracle

##Method 2: row_number() over()

Grammar format

select * from
(select A.*, row_number() over(partition by A.name1 order by A.name12 desc) rn from A)
where rn = 1

Example

select * from (
select 
  d.order_num as "申请单号" ,
  d.create_time ,
  d.emp_name as "申请人",
  (select extractvalue(t1.row_data,'/root/row/FI13_wasteName')
  from dat_table_row t1
  where d.document_id = t1.document_id
  and t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) as "废料名称",
  (select extractvalue(t1.row_data,'/root/row/FI13_units')
  from dat_table_row t1
  where d.document_id = t1.document_id
  and t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) as "单位",
  (select extractvalue(t1.row_data,'/root/row/FI13_estimate')
  from dat_table_row t1
  where d.document_id = t1.document_id
  and t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) as "预估数量",
  (select extractvalue(t1.row_data,'/root/row/FI13_stockRemoval')
  from dat_table_row t1
  where d.document_id = t1.document_id
  and t1.table_id     = 'dynamicRowsIdPTFLXX'
  ) as "累计出库数量",
  (select extractvalue(t1.row_data,'/root/row/FI13_receivingTime')
  from dat_table_row t1
  where d.document_id = t1.document_id
  and t1.table_id     = 'dynamicRowsIdCGYTX'
  ) as "收购方收货时间",
  (select extractvalue(t2.row_data,'/root/row/FI13_collectionTime')
  from dat_table_row t2
  where d.document_id = t2.document_id
  and t2.table_id     = 'dynamicRowsIdPTSJSKSJ'
  ) as "实际收款时间",
  row_number() over(partition by d.order_num  order by d.create_time desc) rn 
from dat_document d,
  dat_table_row dtr
where d.form_name       ='FI14'
and d.document_id       =dtr.document_id
and (d.document_status != 'deleted'
or d.document_status   is null )
  --AND TO_CHAR(d.create_time,'yyyy-MM-dd') BETWEEN '2020-01-01' AND '2021-03-26'
and d.order_num = 'FI1420210708002' --FI1420210708002
) where rn = 1;
Query results


How to deduplicate data in Oracle

Method 3: group by

select 字段名 from 表名
group by 字段名;

Method 4: Using rowid (pseudo column deduplication)

select id,name,age from test t1
where t1.rowid in (select min(rowid) from test t2 where t1.name=t2.name and t1.age=t2.age);

Recommended tutorial: "

Oracle tutorial

The above is the detailed content of How to deduplicate data in Oracle. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn