Home >Backend Development >Python Tutorial >How to Compare Two Dataframes and Extract Differences Based on Specific Columns?
Comparing Two Dataframes and Identifying Differences
In your scenario, you have two dataframes, df1 and df2, with identical structures and row indices. Your goal is to determine which rows exist in df2 but not in df1 by comparing their date and fruit values.
Direct Comparison
The approach of using df1 != df2 is not suitable because it requires identically labeled dataframes. Removing the Date index also fails to resolve the issue.
Concatenation and Grouping
To find the differences, you can concatenate the dataframes into a single dataframe df:
<code class="python">import pandas as pd df = pd.concat([df1, df2]) df = df.reset_index(drop=True)</code>
Group df by all its columns to identify unique records:
<code class="python">df_gpby = df.groupby(list(df.columns))</code>
Filtering Unique Records
Next, retrieve the indices of unique records, which are those with a group size of 1:
<code class="python">idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]</code>
Finally, you can use these indices to filter the concatenated dataframe to obtain only the rows that are exclusive to df2:
<code class="python">df.reindex(idx)</code>
This will return a dataframe containing the desired differences:
Date Fruit Num Color 9 2013-11-25 Orange 8.6 Orange 8 2013-11-25 Apple 22.1 Red
The above is the detailed content of How to Compare Two Dataframes and Extract Differences Based on Specific Columns?. For more information, please follow other related articles on the PHP Chinese website!