Home >Backend Development >Python Tutorial >How to Find Rows in One Pandas DataFrame That Are Not in Another?

How to Find Rows in One Pandas DataFrame That Are Not in Another?

Barbara Streisand
Barbara StreisandOriginal
2024-12-09 07:59:11914browse

How to Find Rows in One Pandas DataFrame That Are Not in Another?

Obtaining DataFrame Rows Not Present in Another DataFrame

To obtain rows from a DataFrame (df1) that are not present in another DataFrame (df2), the following steps can be executed:

import pandas as pd

# Create the two DataFrames.
df1 = pd.DataFrame(data={'col1': [1, 2, 3, 4, 5, 3], 'col2': [10, 11, 12, 13, 14, 10]})
df2 = pd.DataFrame(data={'col1': [1, 2, 3], 'col2': [10, 11, 12]})

# Perform a left join, ensuring each row in df1 joins with a single row in df2.
df_all = df1.merge(df2.drop_duplicates(), on=['col1', 'col2'], how='left', indicator=True)

# Create a boolean condition to identify rows in df1 that are not in df2.
condition = df_all['_merge'] == 'left_only'

# Filter df1 based on the condition.
result = df1[condition]

This approach ensures that only rows in df1 that do not exist in df2 are extracted, taking into account both column values in each row. Alternate solutions that check for individual column values independently may lead to incorrect results.

The above is the detailed content of How to Find Rows in One Pandas DataFrame That Are Not in Another?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn