Home >Backend Development >Python Tutorial >How to Compare Pandas DataFrames and Visualize Differences?
Comparing DataFrames and Visualizing Differences with Side-by-Side Comparison
Given two Pandas dataframes, the task is to identify and showcase the changes between them in a user-friendly format. The goal is to output an HTML table that visually highlights the rows that have changed, displaying both the original and updated values.
Identifying Row Changes
To achieve this, it's necessary to determine the rows that have changed. This can be accomplished by utilizing the ne (not equal) operation, which returns a boolean mask for all elements where the corresponding values in two dataframes differ.
<code class="python">ne = (df1 != df2).any(1)</code>
Locating Modified Entries
Once the rows with changes have been identified, the specific entries that have been modified can be located using the stack() function. By filtering the stacked result based on the boolean mask, it becomes possible to extract the entries that have changed.
<code class="python">changed = (df1 != df2).stack()[ne_stacked] changed.index.names = ['id', 'col']</code>
Extracting Changed Values
Next, the original and updated values can be extracted using the where function. This allows for the creation of a DataFrame that summarizes the changes, with the "from" and "to" columns representing the original and modified values, respectively.
<code class="python">difference_locations = np.where(df1 != df2) changed_from = df1.values[difference_locations] changed_to = df2.values[difference_locations]</code>
Generating HTML Table
Finally, the extracted changes can be organized into an HTML table to visualize the differences between the two dataframes. This table can be rendered with any HTML rendering engine, such as Pandas' to_html method, to provide a user-friendly side-by-side comparison.
<code class="python">pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index).to_html()</code>
The above is the detailed content of How to Compare Pandas DataFrames and Visualize Differences?. For more information, please follow other related articles on the PHP Chinese website!