Home >Backend Development >Python Tutorial >How Do Pandas DataFrames Merge Using Different Join Types?

How Do Pandas DataFrames Merge Using Different Join Types?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-27 13:17:11338browse

How Do Pandas DataFrames Merge Using Different Join Types?

Pandas Merging 101

Understanding Merging

Merging combines two or more DataFrames based on shared keys to create a new DataFrame. Pandas provides various types of merges, including INNER, LEFT, RIGHT, and FULL OUTER joins.

Basic Join Types

a. INNER JOIN

  • Combines rows that share common keys in both DataFrames.
  • Example:

    left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
    right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
    left.merge(right, on='key')

b. LEFT OUTER JOIN

  • Retains all rows from the left DataFrame, adding NaN values for missing keys in the right DataFrame.
  • Example:

    left.merge(right, on='key', how='left')

c. RIGHT OUTER JOIN

  • Retains all rows from the right DataFrame, adding NaN values for missing keys in the left DataFrame.
  • Example:

    left.merge(right, on='key', how='right')

d. FULL OUTER JOIN

  • Combines all rows from both DataFrames, adding NaN values for missing keys.
  • Example:

    left.merge(right, on='key', how='outer')

Excluding Data with Left/Right Excluding Joins

If you need to exclude specific rows, you can perform a Left-Excluding or Right-Excluding JOIN by first performing a LEFT/RIGHT OUTER JOIN and filtering to exclude rows from the other DataFrame.

e. Left-Excluding JOIN

  • Excludes rows from the right DataFrame present in the left DataFrame.
  • Example:

    (left.merge(right, on='key', how='left', indicator=True)
     .query('_merge == "left_only"')
     .drop('_merge', 1))

f. Right-Excluding JOIN

  • Excludes rows from the left DataFrame present in the right DataFrame.
  • Example:

    (left.merge(right, on='key', how='right', indicator=True)
     .query('_merge == "right_only"')
     .drop('_merge', 1))

g. ANTI JOIN

  • Combines rows that are not present in both DataFrames.
  • Example:

    (left.merge(right, on='key', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))

Handling Duplicate Key Columns

To avoid duplicate key columns in the output, you can set appropriate indices as keys before merging:

left3 = left2.set_index('keyLeft')
left3.merge(right2, left_index=True, right_on='keyRight')

Merging on Multiple Columns

To join on multiple columns, specify a list for on (or left_on and right_on, as appropriate).

left.merge(right, on=['key1', 'key2'] ...)

Additional Merge Functions

  • pd.merge_ordered: For ordered JOINs.
  • pd.merge_asof: For approximate joins.

Refer to the documentation on merge, join, and concat for more specific examples and cases.

The above is the detailed content of How Do Pandas DataFrames Merge Using Different Join Types?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn