Home  >  Article  >  Backend Development  >  How to Merge DataFrames Based on a Column, Preserving the First DataFrame\'s Information?

How to Merge DataFrames Based on a Column, Preserving the First DataFrame\'s Information?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-31 05:17:02917browse

How to Merge DataFrames Based on a Column, Preserving the First DataFrame's Information?

How to Merge DataFrames Based on a Column, Preserving the First DataFrame's Information

Given two dataframes with overlapping but not identical rows, you can merge them based on a specific column to add information from one to the other. In this case, you want to merge two dataframes, df1 and df2, where df1 contains information about age and df2 contains information about sex. Here's how to do it while keeping the information from the first dataframe.

Solution

To merge the dataframes and retain the first dataframe's information, use the map function along with a Series created by set_index:

<code class="python">df1['Sex'] = df1['Name'].map(df2.set_index('Name')['Sex'])</code>

For example:

<code class="python">df1 = pd.DataFrame({'Name': ['Tom', 'Sara', 'Eva', 'Jack', 'Laura'], 
                     'Age': [34, 18, 44, 27, 30], 
                     'Sex': None})

df2 = pd.DataFrame({'Name': ['Tom', 'Paul', 'Eva', 'Jack', 'Michelle'], 
                     'Sex': ['M', 'M', 'F', 'M', 'F']})

df1['Sex'] = df1['Name'].map(df2.set_index('Name')['Sex'])

print(df1)</code>

Output:

   Name  Age Sex
0   Tom  34   M
1  Sara  18  NaN
2   Eva  44   F
3  Jack  27   M
4 Laura  30  NaN

Alternative Solution

Another way to achieve the same result is by merging the dataframes using a left join:

<code class="python">df = df1.merge(df2[['Name', 'Sex']], on='Name', how='left')</code>

Example:

<code class="python">df = df1.merge(df2[['Name', 'Sex']], on='Name', how='left')

print(df)</code>

Output:

   Name  Age Sex
0   Tom  34   M
1  Sara  18  NaN
2   Eva  44   F
3  Jack  27   M
4 Laura  30  NaN

Handling Multiple Join Columns

If you need to merge on multiple columns, use the same left join approach with the additional join columns specified:

<code class="python">df = df1.merge(df2[['Name', 'Sex', 'Year', 'Code']], on=['Year', 'Code'], how='left')</code>

Error Handling: Duplicate Join Columns

If you encounter an error due to duplicate join columns, handle it by removing duplicates or using a dictionary for mapping.

The above is the detailed content of How to Merge DataFrames Based on a Column, Preserving the First DataFrame\'s Information?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn