Home >Database >Mysql Tutorial >How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?

How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?

Patricia Arquette
Patricia ArquetteOriginal
2025-01-04 19:11:39860browse

How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?

Identifying Duplicate Records Across Multiple Columns

The challenge posed is to replicate the functionality of the provided SQL code, which aims to identify and retrieve duplicate records based on the combination of two columns, namely "name" and "city." The goal is to exclude cases where only one column, either "name" or "city," matches and only consider records where both columns are identical.

To address this requirement, the provided answer leverages a subquery to count the occurrences of both "name" and "city" combinations within the "stuff" table. This subquery counts the rows grouped by both "name" and "city" and filters out those combinations with a count greater than 1, indicating duplicates.

select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1

The main query then joins the original "stuff" table with the subquery results, matching on both "name" and "city." This join operation effectively identifies the duplicate records that meet the specified criteria. The final result eliminates false positives by excluding cases where only one column matches.

select s.id, t.*
from [stuff] s
join (
...
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

As a result, the query successfully retrieves the desired output:

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

This approach provides an efficient and versatile solution for identifying duplicate records based on multiple columns, ensuring the accuracy and completeness of query results.

The above is the detailed content of How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn