Home >Database >Mysql Tutorial >How to Efficiently Delete Duplicate Rows in Netezza Without a Unique Identifier?
Efficiently remove duplicate rows without unique identifiers in Netezza
When dealing with large tables containing duplicate rows, finding the most efficient way to remove them can be challenging. While this query has been proven to work in SQL, what about in Netezza?
Raw SQL query
<code class="language-sql">WITH TempEmp AS ( SELECT name, ROW_NUMBER() OVER(PARTITION by name, address, zipcode ORDER BY name) AS duplicateRecCount FROM mytable ) DELETE FROM TempEmp WHERE duplicateRecCount > 1;</code>
Netezza Solution
The DELETE statement after the WITH clause is not compatible with Netezza. Please consider the following solution using the USING keyword:
<code class="language-sql">DELETE FROM table_with_dups T1 USING table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
Preview results
To review records before deleting them, replace DELETE with SELECT * and USING with a comma, like this:
<code class="language-sql">SELECT * FROM table_with_dups T1, table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
Performance Notes
If few duplicates are expected, this solution performs better than the solution using the NOT IN (...) clause, which generates a large number of rows in the subquery. Additionally, if the key column contains NULL values, use COALESCE() to handle the comparison, for example:
<code class="language-sql">AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')</code>
The above is the detailed content of How to Efficiently Delete Duplicate Rows in Netezza Without a Unique Identifier?. For more information, please follow other related articles on the PHP Chinese website!