Home >Backend Development >Python Tutorial >How to Compare DataFrames for Differences in Rows?

How to Compare DataFrames for Differences in Rows?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-19 21:13:29343browse

How to Compare DataFrames for Differences in Rows?

Comparing DataFrames for Differences in Rows

When comparing two dataframes with identical rows and columns, the simple comparison operation (df1 != df2) is sufficient. However, if the dataframes have different row sets, a different approach is needed to identify the differences.

Concat, Group, and Filter

One method to compare dataframes for row differences is to concatenate them, group by columns, and filter the unique rows. The following code illustrates this:

<code class="python">df = pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
result = df.reindex(idx)</code>

The concatenated dataframe (df) is grouped by all its columns (df_gpby). The 'groups.values()' method returns an iterable of tuples, where each tuple represents the indices of unique rows. Filtering the tuples by length (len(x) == 1) identifies the rows that exist in only one dataframe. Finally, reindexing the dataframe with the filtered indices (idx) produces a dataframe containing the row differences.

Example Output

Using the example dataframes provided:

>>> result
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

This output shows the rows that are in df2 but not in df1.

The above is the detailed content of How to Compare DataFrames for Differences in Rows?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn