Home >Backend Development >Python Tutorial >How to Pivot a Pandas DataFrame: A Comprehensive Guide to Reshaping Data?

How to Pivot a Pandas DataFrame: A Comprehensive Guide to Reshaping Data?

DDD
DDDOriginal
2024-12-25 10:25:09162browse

How to Pivot a Pandas DataFrame: A Comprehensive Guide to Reshaping Data?

How can I pivot a dataframe?

What is pivot?

  • Reshaping a DataFrame from long to wide format
  • Allows for creating a new DataFrame where values are aggregated based on one or more columns

How do I pivot?

  • Several methods to pivot a DataFrame:

    • pd.DataFrame.pivot_table
    • pd.DataFrame.groupby pd.DataFrame.unstack
    • pd.DataFrame.set_index pd.DataFrame.unstack
    • pd.DataFrame.pivot (less flexible)
    • pd.crosstab (for cross tabulation)
    • pd.factorize np.bincount (advanced, high performance)
    • pd.get_dummies pd.DataFrame.dot (cross tabulation)

Long format to wide format?

  • Long format:

    • Each observation occupies one row
    • Multiple columns representing different attributes/measurements
  • Wide format:

    • Each observation occupies one column
    • Multiple rows representing different attributes/measurements

Examples

Question 1: Why do I get ValueError: Index contains duplicate entries, cannot reshape?

  • This occurs when attempting to pivot a DataFrame with duplicate keys on which it is being pivoted
  • Example: If df has duplicate entries for row and col and you pivot with df.pivot(index='row', columns='col'), you will get the error.

Question 2: How do I pivot df such that the col values are columns, row values are the index, and mean of val0 are the values?

  • Use pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index='row', columns='col', aggfunc='mean')

Question 3: How do I make it so that missing values are 0?

  • Use fill_value argument in pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index='row', columns='col', fill_value=0, aggfunc='mean')

Question 4: Can I get something other than mean, like maybe sum?

  • Use a different aggfunc argument in pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index='row', columns='col', fill_value=0, aggfunc='sum')

Question 5: Can I do more than one aggregation at a time?

  • Provide a list of callables to the aggfunc argument in pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index='row', columns='col', fill_value=0, aggfunc=[np.size, np.mean])

Question 6: Can I aggregate over multiple value columns?

  • Pass multiple column names as a list to values in pd.DataFrame.pivot_table:

    df.pivot_table(values=['val0', 'val1'], index='row', columns='col', fill_value=0, aggfunc='mean')

Question 7: Can I subdivide by multiple columns?

  • Pass multiple column names as a list to index or columns in pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index=['row', 'item'], columns='col', fill_value=0, aggfunc='mean')

Question 8: Or

  • Can subdivide by multiple columns in index and columns using pd.DataFrame.pivot_table:

    df.pivot_table(values='val0', index=['key', 'row'], columns=['item', 'col'], fill_value=0, aggfunc='mean')

Question 9: Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

  • Use pd.crosstab:

    df.pivot_table(values='val0', index='row', columns='col', aggfunc='mean')

Question 10: How do I convert a DataFrame from long to wide by pivoting on ONLY two columns?

df.pivot_table(values='val0', index='row', columns='col', fill_value=0, aggfunc='mean')

Question 11: How do I flatten the multiple index to single index after pivot?

  • Join the multi-part index as a single string:

    df.pivot_table(values='val0', index='row', columns='col', fill_value=0, aggfunc='sum')

The above is the detailed content of How to Pivot a Pandas DataFrame: A Comprehensive Guide to Reshaping Data?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn