Pandas DataFrames에서 쉼표로 구분된 문자열을 효율적으로 분할하는 방법은 무엇입니까?-파이썬 튜토리얼-php.cn

집

백엔드 개발

파이썬 튜토리얼

Pandas DataFrames에서 쉼표로 구분된 문자열을 효율적으로 분할하는 방법은 무엇입니까?

Linda Hamilton

Dec 19, 2024 am 06:18 AM

How to Efficiently Split Comma-Separated Strings in Pandas DataFrames?

Pandas DataFrame에서 쉼표로 구분된 문자열 항목 분할

입력 데이터는 쉼표와 같은 문자로 구분된 값으로 구성되는 경우가 많습니다. Pandas 데이터프레임으로 작업할 때 이러한 문자열 항목을 분할하고 각 값에 대해 별도의 행을 생성해야 합니다. 이번 글에서는 이 목표를 효율적으로 달성하기 위해 사용할 수 있는 방법을 살펴보겠습니다.

Pandas의 .explode() 메서드 활용

Pandas 버전 0.25.0 및 1.3.0에 도입된, .explode() 메서드는 목록이나 배열이 포함된 열을 폭발시키기 위한 간단하고 효율적인 솔루션을 제공합니다. 단일 및 다중 열 모두에서 작동하므로 복잡한 데이터세트를 처리할 때 유연성을 제공합니다.

구문:

dataframe.explode(column_name)

예:

import pandas as pd

# Dataframe with a column containing comma-separated values
df = pd.DataFrame({'var1': ['a,b,c', 'd,e,f'], 'var2': [1, 2]})

# Exploding the 'var1' column
df = df.explode('var1')

# Resulting dataframe with separate rows for each value
print(df)

여러 폭발을 위한 사용자 정의 벡터화 기능 열

여러 열을 확장해야 하는 더 복잡한 시나리오의 경우 사용자 정의 벡터화 함수가 다양한 솔루션을 제공할 수 있습니다.

함수 정의:

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    # Repeat values for non-empty lists
    res = (pd.DataFrame({
                col:np.repeat(df[col].values, lens)
                for col in df.columns.difference(lst_cols)},
                index=np.repeat(df.index.values, lens))
             .assign(**{col:np.concatenate(df.loc[lens>0, col].values)
                            for col in lst_cols}))

    # Append rows with empty lists
    if (lens == 0).any():
        res = (res.append(df.loc[lens==0, df.columns.difference(lst_cols)], sort=False)
                  .fillna(fill_value))

    # Revert index order and reset index if requested
    res = res.sort_index()
    if not preserve_index:
        res = res.reset_index(drop=True)
    return res

예:

# Dataframe with multiple columns containing lists
df = pd.DataFrame({
    'var1': [['a', 'b'], ['c', 'd']],
    'var2': [['x', 'y'], ['z', 'w']]
})

# Exploding 'var1' and 'var2' columns
df = explode(df, ['var1', 'var2'])

# Resulting dataframe with separate rows for each list item
print(df)

그룹화로 변환

또 다른 접근 방식은 .transform()을 사용하여 적용하는 것입니다. 문자열 항목을 분할하고 새 항목을 생성하는 사용자 정의 함수 행:

사용자 정의 함수:

def split_fun(row):
    return [row['var1'].split(',')]

예:

# Dataframe with a column containing comma-separated values
df = pd.DataFrame({'var1': ['a,b,c', 'd,e,f'], 'var2': [1, 2]})

# Creating a new column with split values using transform
df['var1_split'] = df.transform(split_fun)

# Unnest the newly created column to separate rows
df = df.unnest('var1_split')

# Resulting dataframe with separate rows for each value
print(df)

결론

데이터 세트의 특정 요구 사항과 복잡성에 따라 다양한 방법을 사용하여 분할할 수 있습니다. Pandas 데이터프레임의 쉼표로 구분된 문자열 항목입니다. .explode() 메소드를 활용하면 간단하고 효율적인 접근 방식을 제공하는 동시에 사용자 정의 벡터화된 함수는 더 복잡한 시나리오를 처리할 수 있는 유연성을 제공합니다.

위 내용은 Pandas DataFrames에서 쉼표로 구분된 문자열을 효율적으로 분할하는 방법은 무엇입니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

관련 기사

요소를 파이썬 어레이에 어떻게 추가합니까?Apr 30, 2025 am 12:19 AM

inpython, youappendElementStoalistUsingTheAppend () 메소드 1) useappend () forsinglelements : my_list.append (4) .2) useextend () 또는 = formultiplementements : my_list.extend (other_list) 또는 my_list = [4,5,6] .3) useinsert () forspecificpositions : my_list.insert (1,5) .Bearware

Shebang 관련 문제를 어떻게 디버그합니까?Apr 30, 2025 am 12:17 AM

Shebang 문제를 디버깅하는 방법에는 다음이 포함됩니다. 1. Shebang 라인을 확인하여 스크립트의 첫 번째 줄인지 확인하고 접두사 공간이 없는지 확인하십시오. 2. 통역 경로가 올바른지 확인하십시오. 3. 통역사에게 직접 전화하여 스크립트를 실행하여 Shebang 문제를 분리하십시오. 4. Strace 또는 Trusts를 사용하여 시스템 호출을 추적합니다. 5. Shebang에 대한 환경 변수의 영향을 확인하십시오.

파이썬 어레이에서 요소를 어떻게 제거합니까?Apr 30, 2025 am 12:16 AM

pythonlistscanbemanipatedusingseveralmethodstoremoveElements : 1) geremove () methodremove () methodeMovestHefirstoccurrence.2) thePop () methodRemovesAndReTurnSanElementatAgivenIndex.3) THEDELSTATEMENTCANREMORENDEX.4) LESTCORHENSCREC

파이썬 목록에 어떤 데이터 유형을 저장할 수 있습니까?Apr 30, 2025 am 12:07 AM

PythonlistscanstoreAnydatataTATY, 문자열, 부유물, 부울, 기타 목록 및 디터 시어

Python 목록에서 수행 할 수있는 일반적인 작업은 무엇입니까?Apr 30, 2025 am 12:01 AM

pythonlistssupportnumouseOperations : 1) addingElementSwitHappend (), extend (), andinsert ()

Numpy를 사용하여 다차원 배열을 어떻게 생성합니까?Apr 29, 2025 am 12:27 AM

다음 단계를 통해 Numpy를 사용하여 다차원 배열을 만들 수 있습니다. 1) Numpy.array () 함수를 사용하여 NP.Array ([[1,2,3], [4,5,6]]과 같은 배열을 생성하여 2D 배열을 만듭니다. 2) np.zeros (), np.ones (), np.random.random () 및 기타 함수를 사용하여 특정 값으로 채워진 배열을 만듭니다. 3) 서브 어레이의 길이가 일관되고 오류를 피하기 위해 배열의 모양과 크기 특성을 이해하십시오. 4) NP.Reshape () 함수를 사용하여 배열의 모양을 변경하십시오. 5) 코드가 명확하고 효율적인지 확인하기 위해 메모리 사용에주의를 기울이십시오.

Numpy 어레이에서 '방송'의 개념을 설명하십시오.Apr 29, 2025 am 12:23 AM

BroadcastingInnumpyIsamethodtoperformoperationsonArraysoffferentShapesByAutomicallyAligningThem.itsimplifiesCode, enourseadability, andboostsperformance.here'showitworks : 1) smalraysarepaddedwithonestomatchdimenseare

데이터 저장을 위해 목록, Array.Array 및 Numpy Array 중에서 선택하는 방법을 설명하십시오.Apr 29, 2025 am 12:20 AM

forpythondatastorage, chooselistsforflexibilitywithmixeddatatypes, array.arrayformemory-effic homogeneousnumericaldata, andnumpyarraysforadvancednumericalcomputing.listsareversatilebutlessefficipforlargenumericaldatasets.arrayoffersamiddlegro

See all articles