Home >Database >Mysql Tutorial >How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?
Identifying Duplicate Records Across Multiple Columns
The challenge posed is to replicate the functionality of the provided SQL code, which aims to identify and retrieve duplicate records based on the combination of two columns, namely "name" and "city." The goal is to exclude cases where only one column, either "name" or "city," matches and only consider records where both columns are identical.
To address this requirement, the provided answer leverages a subquery to count the occurrences of both "name" and "city" combinations within the "stuff" table. This subquery counts the rows grouped by both "name" and "city" and filters out those combinations with a count greater than 1, indicating duplicates.
select name, city, count(*) as qty from [stuff] group by name, city having count(*) > 1
The main query then joins the original "stuff" table with the subquery results, matching on both "name" and "city." This join operation effectively identifies the duplicate records that meet the specified criteria. The final result eliminates false positives by excluding cases where only one column matches.
select s.id, t.* from [stuff] s join ( ... select name, city, count(*) as qty from [stuff] group by name, city having count(*) > 1 ) t on s.name = t.name and s.city = t.city
As a result, the query successfully retrieves the desired output:
id name city 904834 jim London 904835 jim London 90145 Fred Paris 90132 Fred Paris 90133 Fred Paris
This approach provides an efficient and versatile solution for identifying duplicate records based on multiple columns, ensuring the accuracy and completeness of query results.
The above is the detailed content of How to Efficiently Identify Duplicate Records Based on Multiple Columns in SQL?. For more information, please follow other related articles on the PHP Chinese website!