Home  >  Article  >  Backend Development  >  Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Barbara Streisand
Barbara StreisandOriginal
2024-10-19 14:50:02753browse

Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Coloring Scatter Plots by Column Values Using Pandas and Matplotlib

Matplotlib is a popular Python library for creating static, animated, and interactive visualizations in Python. This article explores using Matplotlib to color scatter plots based on values in a specific column of a Pandas DataFrame.

Imports and Data

To begin, we import the necessary libraries, including Matplotlib (as plt) and Pandas (as pd). We also generate a sample DataFrame ("df") with three columns: "Height," "Weight," and "Gender."

<code class="python">import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(0)
N = 37
_genders = ["Female", "Male", "Non-binary", "No Response"]
df = pd.DataFrame({
    "Height (cm)": np.random.uniform(low=130, high=200, size=N),
    "Weight (kg)": np.random.uniform(low=30, high=100, size=N),
    "Gender": np.random.choice(_genders, size=N),
})</code>

Updating in August 2021

Seaborn has introduced new figure-level functions, such as seaborn.relplot in version 0.11.0. These functions are recommended over using FacetGrid directly.

<code class="python">sns.relplot(data=df, x="Weight (kg)", y="Height (cm)", hue="Gender", hue_order=_genders, aspect=1.61)
plt.show()</code>

Old Answer (2015)

If you wish to use Matplotlib directly, you'll need to map matplotlib's scatter function onto a Pandas DataFrame's categories. To do this:

  • Create a dictionary with unique categories from the column and colors.
  • Add a new "Color" column to the DataFrame, assigning each category a corresponding color.
  • Use the scatter function to plot the data, specifying the color column as the "c" argument.
<code class="python">def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
    fig, ax = plt.subplots()
    categories = np.unique(df[catcol])
    colors = np.linspace(0, 1, len(categories))
    colordict = dict(zip(categories, colors))

    df["Color"] = df[catcol].apply(lambda x: colordict[x])
    ax.scatter(df[xcol], df[ycol], c=df.Color)
    return fig

fig = dfScatter(df)
fig.savefig('fig1.png')</code>

By following these steps, you can easily color scatter plots based on column values using Pandas and Matplotlib.

The above is the detailed content of Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn