Home >Backend Development >Python Tutorial >How to merge and join data using DataFrame in Python?

How to merge and join data using DataFrame in Python?

王林forward: 2023-05-07 21:04:173614browse

merge()

1. Conventional merge

①Method 1

Specify a reference column , based on this column, merge other columns.

import pandas as pd

df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 101, 104],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [105, 120, 113]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num4&#39;: [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")
df_merge = pd.merge(df1, df2, on=&#39;id&#39;)
print(df_merge)

How to merge and join data using DataFrame in Python?

②Method 2

To achieve this merge, you can also merge through the index, that is, based on the index column. Just set both left_index and right_index to True
. (Both left_index and right_index default to False. left_index means that the left table is based on the index of the left table data, and right_index means that the right table is based on the index of the right table data.)

import pandas as pd

df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 101, 104],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [105, 120, 113]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num4&#39;: [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

df_merge = pd.merge(df1, df2, left_index=True, right_index=True)
print(df_merge)

How to merge and join data using DataFrame in Python?

Compared with method ①, the difference is that, as shown in the figure, there are duplicate columns in the data merged by method ②.

Important parameters

pd.merge(right,how=‘inner’, on=“None”, left_on=“None”, right_on=“None”, left_index= False, right_index=False )

Parameter	Description
left	Left table, merged object, DataFrame or Series
right	Right table, merged object, DataFrame or Series
how	The merging method can be left (left merging), right (right merging), outer (outer merging), inner (inner merging)
on	Column name of the base column
left_on	Column name of the base column of the left table
right_on	Right table base column column name
left_index	Whether the left column is based on index, the default is False, no
right_index	Whether the right column is based on index, the default is False, no

Among them, left_index and right_index Cannot be specified together with on.

Merge method left right outer inner

Prepare data‘

Prepare a new set of data:

import pandas as pd

df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 101, 104],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [105, 120, 113]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;004&#39;, &#39;003&#39;],
                    &#39;num4&#39;: [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

How to merge and join data using DataFrame in Python?

inner (default)

Uses the intersection of keys from both datasets

df_merge = pd.merge(df1, df2, on=&#39;id&#39;)
print(df_merge)

How to merge and join data using DataFrame in Python?

outer

Using the union of keys from both datasets

df_merge = pd.merge(df1, df2, on=&#39;id&#39;, how="outer")
print(df_merge)

How to merge and join data using DataFrame in Python?

left

Use keys from left data set

df_merge = pd.merge(df1, df2, on=&#39;id&#39;, how=&#39;left&#39;)
print(df_merge)

How to merge and join data using DataFrame in Python?

right

Use keys from right data set

df_merge = pd.merge(df1, df2, on=&#39;id&#39;, how=&#39;right&#39;)
print(df_merge)

How to merge and join data using DataFrame in Python?

2. Many-to-one merge

import pandas as pd

df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 101, 104],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [105, 120, 113]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;001&#39;, &#39;003&#39;],
                    &#39;num4&#39;: [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

How to merge and join data using DataFrame in Python?

As shown in the figure, there is duplicate id1 data in df2.

Merge

df_merge = pd.merge(df1, df2, on=&#39;id&#39;)
print(df_merge)

The merged result is as shown in the figure:

How to merge and join data using DataFrame in Python?

Still according to the default Inner method, using the data from the two data sets The intersection of keys. And rows with duplicate keys will be reflected as multiple rows in the merged result.

3. Many-to-many merge

For example, there are multiple rows with duplicate IDs in both Chart 1 and Table 2.

import pandas as pd
df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;002&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 101, 104, 114, 123],
                    &#39;num2&#39;: [110, 102, 121, 113, 126],
                    &#39;num3&#39;: [105, 120, 113, 124, 128]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;001&#39;, &#39;002&#39;, &#39;003&#39;, &#39;001&#39;],
                    &#39;num4&#39;: [80, 86, 79, 88, 93]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

How to merge and join data using DataFrame in Python?

df_merge = pd.merge(df1, df2, on=&#39;id&#39;)
print(df_merge)

How to merge and join data using DataFrame in Python?

concat()

pd.concat(objs, axis=0, join= ‘outer’, ignore_index:bool=False,keys=None,levels=None,names=None, verify_integrity:bool=False,sort:bool=False,copy:bool=True)

Parameters	Description
objs	A sequence of Series, DataFrame or Panel objects Or mapping
axis	Default is 0, indicating columns. If 1 it means row.
#join	The default is "outer", it can also be "inner"
ignore_index	The default is False, which means the index is retained (not ignored). Set to True to ignore the index.

其他重要参数通过实例说明。

1.相同字段的表首位相连

首先准备三组DataFrame数据：

import pandas as pd
df1 = pd.DataFrame({&#39;id&#39;: [&#39;001&#39;, &#39;002&#39;, &#39;003&#39;],
                    &#39;num1&#39;: [120, 114, 123],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [113, 124, 128]})
df2 = pd.DataFrame({&#39;id&#39;: [&#39;004&#39;, &#39;005&#39;],
                    &#39;num1&#39;: [120, 101],
                    &#39;num2&#39;: [113, 126],
                    &#39;num3&#39;: [105, 128]})
df3 = pd.DataFrame({&#39;id&#39;: [&#39;007&#39;, &#39;008&#39;, &#39;009&#39;],
                    &#39;num1&#39;: [120, 101, 125],
                    &#39;num2&#39;: [113, 126, 163],
                    &#39;num3&#39;: [105, 128, 114]})


print(df1)
print("=======================================")
print(df2)
print("=======================================")
print(df3)

How to merge and join data using DataFrame in Python?

合并

dfs = [df1, df2, df3]
result = pd.concat(dfs)
print(result)

How to merge and join data using DataFrame in Python?

如果想要在合并后，标记一下数据都来自于哪张表或者数据的某类别，则也可以给concat加上 参数keys 。

result = pd.concat(dfs, keys=[&#39;table1&#39;, &#39;table2&#39;, &#39;table3&#39;])
print(result)

How to merge and join data using DataFrame in Python?

此时，添加的keys与原来的index组成元组，共同成为新的index。

print(result.index)

How to merge and join data using DataFrame in Python?

2.横向表合并（行对齐）

准备两组DataFrame数据：

import pandas as pd
df1 = pd.DataFrame({&#39;num1&#39;: [120, 114, 123],
                    &#39;num2&#39;: [110, 102, 121],
                    &#39;num3&#39;: [113, 124, 128]}, index=[&#39;001&#39;, &#39;002&#39;, &#39;003&#39;])
df2 = pd.DataFrame({&#39;num3&#39;: [117, 120, 101, 126],
                    &#39;num5&#39;: [113, 125, 126, 133],
                    &#39;num6&#39;: [105, 130, 128, 128]}, index=[&#39;002&#39;, &#39;003&#39;, &#39;004&#39;, &#39;005&#39;])

print(df1)
print("=======================================")
print(df2)

How to merge and join data using DataFrame in Python?

当axis为默认值0时：

result = pd.concat([df1, df2])
print(result)

How to merge and join data using DataFrame in Python?

横向合并需要将axis设置为1 ：

result = pd.concat([df1, df2], axis=1)
print(result)

How to merge and join data using DataFrame in Python?

对比以上输出差异。

axis=0时，即默认纵向合并时，如果出现重复的行，则会同时体现在结果中
axis=1时，即横向合并时，如果出现重复的列，则会同时体现在结果中。

3.交叉合并

result = pd.concat([df1, df2], axis=1, join=&#39;inner&#39;)
print(result)

How to merge and join data using DataFrame in Python?

The above is the detailed content of How to merge and join data using DataFrame in Python?. For more information, please follow other related articles on the PHP Chinese website!

Python sort bool copy 对象

Statement：

This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete

Previous article：Methods and steps for implementing face recognition using PythonNext article：Methods and steps for implementing face recognition using Python

See more

How to merge and join data using DataFrame in Python?

merge()

1. Conventional merge

①Method 1

②Method 2

Important parameters

Merge method left right outer inner

2. Many-to-one merge

3. Many-to-many merge

concat()

1.相同字段的表首位相连

2.横向表合并（行对齐）

3.交叉合并

Related articles