如何辨識一個 Pandas DataFrame 中存在而不是另一個中存在的行？-Python教學-PHP中文網

首頁

後端開發

Python教學

如何辨識一個 Pandas DataFrame 中存在而不是另一個中存在的行？

Patricia Arquette

Jan 03, 2025 am 10:45 AM

How to Identify Rows Present in One Pandas DataFrame but Not Another?

識別 Pandas DataFrame 中的不常見行

使用多個資料框時，有必要識別一個資料框中存在但另一個中不存在的行。假設我們有兩個資料框 df1 和 df2，其中 df2 是 df1 的子集。

我們如何從 df1 中提取 df2 中不存在的行？

考慮以下範例：

import pandas as pd

df1 = pd.DataFrame(data={'col1': [1, 2, 3, 4, 5, 3], 'col2': [10, 11, 12, 13, 14, 10]})
df2 = pd.DataFrame(data={'col1': [1, 2, 3], 'col2': [10, 11, 12]})

print("df1:")
print(df1)

print("\ndf2:")
print(df2)

輸出：

   col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14
5     3    10

   col1  col2
0     1    10
1     2    11
2     3    12

我們的目標是找出 df1 中的行df2 中不存在。

解決方案：

為了準確識別不常見的行，我們需要在col1 和col2 列上執行df1 和df2 之間的左連接，確保消除df2中的重複項。此外，我們指定 Indicator=True 來建立一個額外的列，指示每個合併行的來源。

產生的資料框df_all 包含df1 和df2 中的所有行，並附加一個列_merge 來指示是否合併行源自兩個資料幀（兩者）、僅df1 (left_only) 或僅df2 (right_only)。

df_all = df1.merge(df2.drop_duplicates(), on=['col1', 'col2'], how='left', indicator=True)

我們現在可以使用布林條件 df_all['_merge'] == 'left_only' 過濾 df_all 以從 df1 中提取不常見的行。

df_uncommon = df_all[df_all['_merge'] == 'left_only']
print("\nUncommon rows in df1:")
print(df_uncommon)

這將返回所需的輸出：

   col1  col2 _merge
3     4    13  left_only
4     5    14  left_only
5     3    10  left_only

利用具有重複消除功能的左連接和_merge 列，我們可以有效地識別並提取df1 中不存在於df2 中的行。

以上是如何辨識一個 Pandas DataFrame 中存在而不是另一個中存在的行？的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

在Python陣列上可以執行哪些常見操作？Apr 26, 2025 am 12:22 AM

Pythonarrayssupportvariousoperations:1)Slicingextractssubsets,2)Appending/Extendingaddselements,3)Insertingplaceselementsatspecificpositions,4)Removingdeleteselements,5)Sorting/Reversingchangesorder,and6)Listcomprehensionscreatenewlistsbasedonexistin

在哪些類型的應用程序中，Numpy數組常用？Apr 26, 2025 am 12:13 AM

NumPyarraysareessentialforapplicationsrequiringefficientnumericalcomputationsanddatamanipulation.Theyarecrucialindatascience,machinelearning,physics,engineering,andfinanceduetotheirabilitytohandlelarge-scaledataefficiently.Forexample,infinancialanaly

您什麼時候選擇在Python中的列表上使用數組？Apr 26, 2025 am 12:12 AM

useanArray.ArarayoveralistinpythonwhendeAlingwithHomoGeneData，performance-Caliticalcode，orinterfacingwithccode.1）同質性data：arraysSaveMemorywithTypedElements.2）績效code-performance-calitialcode-calliginal-clitical-clitical-calligation-Critical-Code：Arraysofferferbetterperbetterperperformanceformanceformancefornallancefornalumericalical.3）

所有列表操作是否由數組支持，反之亦然？為什麼或為什麼不呢？Apr 26, 2025 am 12:05 AM

不，notalllistoperationsareSupportedByArrays，andviceversa.1）arraysdonotsupportdynamicoperationslikeappendorinsertwithoutresizing，wheremactsperformance.2）listssdonotguaranteeconecontanttanttanttanttanttanttanttanttanttimecomplecomecomplecomecomecomecomecomecomplecomectacccesslectaccesslecrectaccesslerikearraysodo。

您如何在python列表中訪問元素？Apr 26, 2025 am 12:03 AM

toAccesselementsInapythonlist，useIndIndexing，負索引，切片，口頭化。 1）indexingStartSat0.2）否定indexingAccessesessessessesfomtheend.3）slicingextractsportions.4）iterationerationUsistorationUsisturessoreTionsforloopsoreNumeratorseforeporloopsorenumerate.alwaysCheckListListListListlentePtotoVoidToavoIndexIndexIndexIndexIndexIndExerror。

Python的科學計算中如何使用陣列？Apr 25, 2025 am 12:28 AM

Arraysinpython，尤其是Vianumpy，ArecrucialInsCientificComputingfortheireftheireffertheireffertheirefferthe.1）Heasuedfornumerericalicerationalation，dataAnalysis和Machinelearning.2）Numpy'Simpy'Simpy'simplementIncressionSressirestrionsfasteroperoperoperationspasterationspasterationspasterationspasterationspasterationsthanpythonlists.3）inthanypythonlists.3）andAreseNableAblequick

您如何處理同一系統上的不同Python版本？Apr 25, 2025 am 12:24 AM

你可以通過使用pyenv、venv和Anaconda來管理不同的Python版本。 1）使用pyenv管理多個Python版本：安裝pyenv，設置全局和本地版本。 2）使用venv創建虛擬環境以隔離項目依賴。 3）使用Anaconda管理數據科學項目中的Python版本。 4）保留系統Python用於系統級任務。通過這些工具和策略，你可以有效地管理不同版本的Python，確保項目順利運行。

與標準Python陣列相比，使用Numpy數組的一些優點是什麼？Apr 25, 2025 am 12:21 AM

numpyarrayshaveseveraladagesoverandastardandpythonarrays：1）基於基於duetoc的iMplation，2）2）他們的aremoremoremorymorymoremorymoremorymoremorymoremoremory，尤其是WithlargedAtasets和3）效率化，效率化，矢量化函數函數函數函數構成和穩定性構成和穩定性的操作，製造

See all articles