Home  >  Article  >  Backend Development  >  How can I create clustered stacked bar plots in Python using Pandas and Matplotlib?

How can I create clustered stacked bar plots in Python using Pandas and Matplotlib?

Patricia Arquette
Patricia ArquetteOriginal
2024-11-02 15:26:29314browse

How can I create clustered stacked bar plots in Python using Pandas and Matplotlib?

Creating Clustered Stacked Bar Plots with Pandas and Matplotlib

In this article, we will explore a method to create clustered stacked bar plots using Pandas and Matplotlib. This technique allows you to visualize multiple datasets with identical columns and index side by side, with bars stacked on top of each other.

Understanding the Problem

Consider the following scenario: You have three dataframes, each containing values for columns "I" and "J" with the same index. You want to create stacked bar plots for each dataframe but arranged in a specific way:

  • Bar for dataframe 1 at index A
  • Bar for dataframe 2 at index A
  • Bar for dataframe 1 at index B
  • Bar for dataframe 2 at index B

Implementing the Solution

The following code snippet demonstrates an efficient solution to this problem:

<code class="python">import pandas as pd
import matplotlib.cm as cm
import numpy as np
import matplotlib.pyplot as plt

def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):
    """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot.
    labels is a list of the names of the dataframe, used for the legend
    title is a string for the title of the plot
    H is the hatch used for identification of the different dataframe"""

    n_df = len(dfall)
    n_col = len(dfall[0].columns) 
    n_ind = len(dfall[0].index)
    axe = plt.subplot(111)

    for df in dfall : # for each data frame
        axe = df.plot(kind="bar",
                      linewidth=0,
                      stacked=True,
                      ax=axe,
                      legend=False,
                      grid=False,
                      **kwargs)  # make bar plots

    h,l = axe.get_legend_handles_labels() # get the handles we want to modify
    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
        for j, pa in enumerate(h[i:i+n_col]):
            for rect in pa.patches: # for each index
                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))
                rect.set_hatch(H * int(i / n_col)) #edited part     
                rect.set_width(1 / float(n_df + 1))

    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)
    axe.set_xticklabels(df.index, rotation = 0)
    axe.set_title(title)

    # Add invisible data to add another legend
    n=[]
    for i in range(n_df):
        n.append(axe.bar(0, 0, color="gray", hatch=H * i))

    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
    if labels is not None:
        l2 = plt.legend(n, labels, loc=[1.01, 0.1]) 
    axe.add_artist(l1)
    return axe

# create fake dataframes
df1 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df2 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df3 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])

# Then, just call :
plot_clustered_stacked([df1, df2, df3],[“df1”, “df2”, “df3”])</code>

Using the Function

To use this function, pass a list of dataframes as the first argument. The next argument can be a list of labels for the legend. The title argument specifies the title of the plot. Finally, H represents the hatch pattern used to differentiate each dataframe.

Output

The resulting plot will display clustered stacked bars for each dataframe, arranged side by side. The bars for each index will be stacked on top of each other, with different hatches indicating different dataframes.

Additional Features

Customizing Colors:

You can customize the colors of the bars by passing a cmap argument to the plot_clustered_stacked function, which takes an instance of a matplotlib colormap. Here's an example:

<code class="python">plot_clustered_stacked([df1, df2, df3], ["df1", "df2", "df3"], cmap=plt.cm.viridis)</code>

Conclusion

This approach provides a simple and effective method to create clustered stacked bar plots using Pandas and Matplotlib. It allows you to visualize multiple datasets side by side in a clear and informative manner. By adjusting the hatch patterns and colors, you can further customize the plot to fit your specific requirements.

The above is the detailed content of How can I create clustered stacked bar plots in Python using Pandas and Matplotlib?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn