집 >백엔드 개발 >파이썬 튜토리얼 >\'isin\' 및 \'sort_values\'를 사용하여 Pandas DataFrame에서 모든 중복 항목을 찾는 방법은 무엇입니까?

\'isin\' 및 \'sort_values\'를 사용하여 Pandas DataFrame에서 모든 중복 항목을 찾는 방법은 무엇입니까?

Susan Sarandon원래의: 2024-10-25 09:54:28801검색

How to Find All Duplicate Items in a Pandas DataFrame Using 'isin' and 'sort_values'?

'isin' 및 'sort_values'를 사용하여 Pandas DataFrame의 모든 중복 항목 나열

이 문서에서는 문제를 해결하겠습니다. 내보내기 오류가 있을 수 있는 항목 목록 내에서 모든 중복 항목을 찾는 것입니다. 우리의 목표는 수동 비교 및 문제 해결을 위해 이러한 중복 항목의 전체 목록을 검색하는 것입니다.

Pandas의 'duplicated' 메서드는 기본적으로 중복 값의 첫 번째 인스턴스만 반환합니다. 그러나 'isin'과 'sort_values'의 조합을 사용하면 중복된 ID와 관련된 모든 행을 표시할 수 있습니다.

<code class="python"># Import the pandas library
import pandas as pd

# Read the data from the CSV file
df = pd.read_csv('dup.csv')

# Extract the 'ID' column
ids = df['ID']

# Use 'isin' to filter for rows where the 'ID' matches any of the duplicate IDs
df[ids.isin(ids[ids.duplicated()])].sort_values('ID')</code>

이 방법은 'ID' 열에 다음 중 하나가 포함된 DataFrame의 모든 행을 나열합니다. 중복으로 표시된 ID. 출력에서는 중복 행을 제거하여 각 중복 ID가 한 번만 나타나도록 합니다.

대체 방법: 'groupby' 및 'concat'을 사용하여 ID별로 그룹화

대체 접근 방식 DataFrame을 'ID'로 그룹화한 다음 두 개 이상의 행으로 그룹을 연결하는 작업이 포함됩니다.

<code class="python"># Group the DataFrame by 'ID'
groups = df.groupby('ID')

# Identify groups with more than one row
large_groups = [group for _, group in groups if len(group) > 1]

# Concatenate the large groups
pd.concat(large_groups)</code>

이 방법은 모든 중복 항목을 검색하고 다시 각 중복 그룹 내의 중복 항목을 제외합니다. 기본적으로 'concat' 기능은 중복 그룹을 수직으로 추가합니다.

위 내용은 \'isin\' 및 \'sort_values\'를 사용하여 Pandas DataFrame에서 모든 중복 항목을 찾는 방법은 무엇입니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

pandas for using function default this display column issue

성명：

이전 기사：Pandas DataFrame에 여러 열을 효율적으로 추가하는 방법은 무엇입니까?다음 기사：Pandas DataFrame에 여러 열을 효율적으로 추가하는 방법은 무엇입니까?