Home >Backend Development >Python Tutorial >How to merge and join data using DataFrame in Python?
Specify a reference column , based on this column, merge other columns.
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = pd.DataFrame({'id': ['001', '002', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================") df_merge = pd.merge(df1, df2, on='id') print(df_merge)
To achieve this merge, you can also merge through the index, that is, based on the index column. Just set both left_index and right_index to True
. (Both left_index and right_index default to False. left_index means that the left table is based on the index of the left table data, and right_index means that the right table is based on the index of the right table data.)
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = pd.DataFrame({'id': ['001', '002', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================") df_merge = pd.merge(df1, df2, left_index=True, right_index=True) print(df_merge)
Compared with method ①, the difference is that, as shown in the figure, there are duplicate columns in the data merged by method ②.
pd.merge(right,how=‘inner’, on=“None”, left_on=“None”, right_on=“None”, left_index= False, right_index=False )
Parameter | Description |
---|---|
left | Left table, merged object, DataFrame or Series |
right | Right table, merged object, DataFrame or Series |
how | The merging method can be left (left merging), right (right merging), outer (outer merging), inner (inner merging) |
on | Column name of the base column |
left_on | Column name of the base column of the left table |
right_on | Right table base column column name |
left_index | Whether the left column is based on index, the default is False, no |
right_index | Whether the right column is based on index, the default is False, no |
Among them, left_index and right_index Cannot be specified together with on.
Prepare data‘
Prepare a new set of data:
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = pd.DataFrame({'id': ['001', '004', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================")
inner (default)
Uses the intersection of keys from both datasets
df_merge = pd.merge(df1, df2, on='id') print(df_merge)
outer
Using the union of keys from both datasets
df_merge = pd.merge(df1, df2, on='id', how="outer") print(df_merge)
left
Use keys from left data set
df_merge = pd.merge(df1, df2, on='id', how='left') print(df_merge)
right
Use keys from right data set
df_merge = pd.merge(df1, df2, on='id', how='right') print(df_merge)
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = pd.DataFrame({'id': ['001', '001', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================")
As shown in the figure, there is duplicate id1 data in df2.
Merge
df_merge = pd.merge(df1, df2, on='id') print(df_merge)
The merged result is as shown in the figure:
Still according to the default Inner method, using the data from the two data sets The intersection of keys. And rows with duplicate keys will be reflected as multiple rows in the merged result.
For example, there are multiple rows with duplicate IDs in both Chart 1 and Table 2.
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '002', '002', '003'], 'num1': [120, 101, 104, 114, 123], 'num2': [110, 102, 121, 113, 126], 'num3': [105, 120, 113, 124, 128]}) df2 = pd.DataFrame({'id': ['001', '001', '002', '003', '001'], 'num4': [80, 86, 79, 88, 93]}) print(df1) print("=======================================") print(df2) print("=======================================")
df_merge = pd.merge(df1, df2, on='id') print(df_merge)
pd.concat(objs, axis=0, join= ‘outer’, ignore_index:bool=False,keys=None,levels=None,names=None, verify_integrity:bool=False,sort:bool=False,copy:bool=True)
Parameters | Description |
---|---|
objs | A sequence of Series, DataFrame or Panel objects Or mapping |
axis | Default is 0, indicating columns. If 1 it means row. |
#join | The default is "outer", it can also be "inner" |
ignore_index | The default is False, which means the index is retained (not ignored). Set to True to ignore the index. |
其他重要参数通过实例说明。
首先准备三组DataFrame数据:
import pandas as pd df1 = pd.DataFrame({'id': ['001', '002', '003'], 'num1': [120, 114, 123], 'num2': [110, 102, 121], 'num3': [113, 124, 128]}) df2 = pd.DataFrame({'id': ['004', '005'], 'num1': [120, 101], 'num2': [113, 126], 'num3': [105, 128]}) df3 = pd.DataFrame({'id': ['007', '008', '009'], 'num1': [120, 101, 125], 'num2': [113, 126, 163], 'num3': [105, 128, 114]}) print(df1) print("=======================================") print(df2) print("=======================================") print(df3)
合并
dfs = [df1, df2, df3] result = pd.concat(dfs) print(result)
如果想要在合并后,标记一下数据都来自于哪张表或者数据的某类别,则也可以给concat加上 参数keys 。
result = pd.concat(dfs, keys=['table1', 'table2', 'table3']) print(result)
此时,添加的keys与原来的index组成元组,共同成为新的index。
print(result.index)
准备两组DataFrame数据:
import pandas as pd df1 = pd.DataFrame({'num1': [120, 114, 123], 'num2': [110, 102, 121], 'num3': [113, 124, 128]}, index=['001', '002', '003']) df2 = pd.DataFrame({'num3': [117, 120, 101, 126], 'num5': [113, 125, 126, 133], 'num6': [105, 130, 128, 128]}, index=['002', '003', '004', '005']) print(df1) print("=======================================") print(df2)
当axis为默认值0时:
result = pd.concat([df1, df2]) print(result)
横向合并需要将axis设置为1 :
result = pd.concat([df1, df2], axis=1) print(result)
对比以上输出差异。
axis=0时,即默认纵向合并时,如果出现重复的行,则会同时体现在结果中
axis=1时,即横向合并时,如果出现重复的列,则会同时体现在结果中。
result = pd.concat([df1, df2], axis=1, join='inner') print(result)
The above is the detailed content of How to merge and join data using DataFrame in Python?. For more information, please follow other related articles on the PHP Chinese website!