Home >Backend Development >Python Tutorial >How to Easily Identify and Display Differences Between DataFrames

How to Easily Identify and Display Differences Between DataFrames

DDD
DDDOriginal
2024-10-22 20:50:05405browse

How to Easily Identify and Display Differences Between DataFrames

Compare DataFrames and Display Differences Side-by-Side

In the pursuit of identifying data discrepancies, the need often arises to compare two dataframes and highlight the changes between them. Consider the following example:

"StudentRoster Jan-1":
id    Name   score                    isEnrolled           Comment
111   Jack   2.17                     True                 He was late to class
112   Nick   1.11                     False                Graduated
113   Zoe    4.12                     True

"StudentRoster Jan-2":
id    Name   score                    isEnrolled           Comment
111   Jack   2.17                     True                 He was late to class
112   Nick   1.21                     False                Graduated
113   Zoe    4.12                     False                On vacation

To achieve the desired output, first determine the rows that have undergone any change:

ne = (df1 != df2).any(1)

Next, identify the specific entries that have changed:

ne_stacked = (df1 != df2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']

Proceed to extract the original and updated values for the changed entries:

difference_locations = np.where(df1 != df2)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]

Finally, present the differences in a user-friendly tabular format:

pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)

This approach provides a comprehensive summary of the differences between two dataframes, highlighting both the changed values and their locations, enabling quick and efficient analysis of data discrepancies.

The above is the detailed content of How to Easily Identify and Display Differences Between DataFrames. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn