Home  >  Article  >  Backend Development  >  How to Create Clustered Stacked Bar Charts for Multiple DataFrames in Python?

How to Create Clustered Stacked Bar Charts for Multiple DataFrames in Python?

Susan Sarandon
Susan SarandonOriginal
2024-11-02 19:07:30259browse

How to Create Clustered Stacked Bar Charts for Multiple DataFrames in Python?

Creating Clustered Stacked Bar Charts for Multiple DataFrames

Problem Statement

When dealing with multiple dataframes with identical columns and indexes, it can be desirable to create clustered stacked bar charts to visualize the data. The challenge arises when you want to stack the bars for each dataframe separately, grouped by their corresponding indexes.

Solution with Pandas and Matplotlib

Using a combination of Pandas and Matplotlib, we can achieve this by manually adjusting the positions and hatching patterns of the bar rectangles. Here's a detailed solution:

<code class="python">import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/" , **kwargs):

    n_df = len(dfall)
    n_col = len(dfall[0].columns)
    n_ind = len(dfall[0].index)
    axe = plt.subplot(111)

    for df in dfall:  # for each data frame
        axe = df.plot(kind="bar",
                      linewidth=0,
                      stacked=True,
                      ax=axe,
                      legend=False,
                      grid=False,
                      **kwargs)  # make bar plots

    h, l = axe.get_legend_handles_labels() # get the handles we want to modify
    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
        for j, pa in enumerate(h[i:i+n_col]):
            for rect in pa.patches: # for each index
                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))
                rect.set_hatch(H * int(i / n_col)) #edited part
                rect.set_width(1 / float(n_df + 1))

    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)
    axe.set_xticklabels(df.index, rotation = 0)
    axe.set_title(title)

    # Add invisible data to add another legend
    n=[]
    for i in range(n_df):
        n.append(axe.bar(0, 0, color="gray", hatch=H * i))

    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
    if labels is not None:
        l2 = plt.legend(n, labels, loc=[1.01, 0.1])
    axe.add_artist(l1)
    return axe</code>

Solution with Seaborn

Using Seaborn's barplot function, we can create stacked bar charts but cannot natively stack bars for different dataframes. To overcome this, we can use the following workaround:

  1. Convert the dataframes into a "tidy" format using pd.melt().
  2. Calculate the cumulative sum of each bar using groupby and cumsum(), creating a new column called vcs.
  3. Iterate through the groups of variables and plot the cumulative sum using sns.barplot().
<code class="python">import seaborn as sns

# Convert dataframes to tidy format
dfall.set_index([&quot;Name&quot;, &quot;index&quot;, &quot;variable&quot;], inplace=1)
dfall[&quot;vcs&quot;] = dfall.groupby(level=[&quot;Name&quot;, &quot;index&quot;]).cumsum()
dfall.reset_index(inplace=True)

# Create color palette
c = [&quot;blue&quot;, &quot;purple&quot;, &quot;red&quot;, &quot;green&quot;, &quot;pink&quot;]

# Iterate through groups and plot stacked bars
for i, g in enumerate(dfall.groupby(&quot;variable&quot;)):
    ax = sns.barplot(data=g[1],
                    x=&quot;index&quot;,
                    y=&quot;vcs&quot;,</code>

The above is the detailed content of How to Create Clustered Stacked Bar Charts for Multiple DataFrames in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn