在Netezza中有效刪除無唯一識別碼的重複行
處理包含重複行的大型表時,尋找最有效的方法來刪除它們可能具有挑戰性。雖然此查詢在SQL中已被證明有效,但在Netezza中呢?
原始SQL查詢
<code class="language-sql">WITH TempEmp AS ( SELECT name, ROW_NUMBER() OVER(PARTITION by name, address, zipcode ORDER BY name) AS duplicateRecCount FROM mytable ) DELETE FROM TempEmp WHERE duplicateRecCount > 1;</code>
Netezza解
WITH子句後的DELETE語句與Netezza不相容。請考慮使用USING關鍵字的以下解決方案:
<code class="language-sql">DELETE FROM table_with_dups T1 USING table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
預覽結果
要刪除記錄之前進行審查,請將DELETE替換為SELECT *,並將USING替換為逗號,如下所示:
<code class="language-sql">SELECT * FROM table_with_dups T1, table_with_dups T2 WHERE T1.ctid < T2.ctid AND T1.name = T2.name AND T1.address = T2.address AND T1.zipcode = T2.zipcode;</code>
效能注意事項
如果預計重複項很少,此解決方案的效能優於使用NOT IN (...)子句的解決方案,後者會在子查詢中產生大量行。此外,如果關鍵列包含NULL值,請使用COALESCE()處理比較,例如:
<code class="language-sql">AND COALESCE(T1.col_with_nulls, '[NULL]') = COALESCE(T2.col_with_nulls, '[NULL]')</code>
以上是如何在沒有唯一識別碼的情況下有效刪除 Netezza 中的重複行?的詳細內容。更多資訊請關注PHP中文網其他相關文章!