在Netezza中高效删除无唯一标识符的重复行
处理包含重复行的大型表时,寻找最有效的方法来删除它们可能具有挑战性。虽然此查询在SQL中已被证明有效,但在Netezza中呢?
原始SQL查询
<code class="language-sql">WITH TempEmp AS ( SELECT name, ROW_NUMBER() OVER(PARTITION by name, address, zipcode ORDER BY name) AS duplicateRecCount FROM mytable ) DELETE FROM TempEmp WHERE duplicateRecCount > 1;</code>
Netezza解决方案
WITH子句后的DELETE语句与Netezza不兼容。请考虑使用USING关键字的以下解决方案:
<code class="language-sql">DELETE FROM table_with_dups T1 USING table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
预览结果
要删除记录之前进行审查,请将DELETE替换为SELECT *,并将USING替换为逗号,如下所示:
<code class="language-sql">SELECT * FROM table_with_dups T1, table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
性能注意事项
如果预计重复项很少,此解决方案的性能优于使用NOT IN (...)子句的解决方案,后者会在子查询中生成大量行。此外,如果关键列包含NULL值,请使用COALESCE()处理比较,例如:
<code class="language-sql">AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')</code>
以上是如何在没有唯一标识符的情况下有效删除 Netezza 中的重复行?的详细内容。更多信息请关注PHP中文网其他相关文章!