Home >Backend Development >Python Tutorial >How to efficiently convert a Pandas DataFrame with missing values into a NumPy array?

How to efficiently convert a Pandas DataFrame with missing values into a NumPy array?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-05 02:42:02699browse

How to efficiently convert a Pandas DataFrame with missing values into a NumPy array?

Convert Pandas Dataframe with Missing Values to NumPy Array

The most efficient method to convert a Pandas dataframe with missing values to a NumPy array is through df.to_numpy(). It offers several advantages over older methods like df.values, including:

  • Consistently returns a view of the underlying data to minimize memory consumption.
  • Handles extension types by converting them to appropriate NumPy dtypes.
  • Preserves the original data types unless specified otherwise.

Example:

<code class="python">import pandas as pd
import numpy as np

# Create a DataFrame with missing values
df = pd.DataFrame({'A': [np.nan, np.nan, 0.1, 0.1, 0.1, 0.1],
                   'B': [0.2, np.nan, 0.2, 0.2, np.nan, np.nan],
                   'C': [np.nan, 0.5, 0.5, np.nan, 0.5, np.nan]})

# Convert to a NumPy array with missing values represented as `np.nan`
array = df.to_numpy()

# Result:
# array([[ nan,  0.2,  nan],
#        [ nan,  nan,  0.5],
#        [ 0.1,  0.2,  0.5],
#        [ 0.1,  0.2,  nan],
#        [ 0.1,  nan,  0.5],
#        [ 0.1,  nan,  nan]])</code>

Preserving Dtypes:

While to_numpy doesn't support preserving Dtypes directly, you can use np.rec.fromrecords to achieve this effect.

<code class="python"># Create a DataFrame with mixed data types
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6],
                   'C': [7.2, 8.1, 9.3]})

# Convert to a structured array with preserved Dtypes
struct_array = np.rec.fromrecords(
    df.reset_index(),
    names=list(df.columns) + ['index']
)

# Result:
# rec.array([('a', 1, 4, 7.2), ('b', 2, 5, 8.1), ('c', 3, 6, 9.3)],
#           dtype=[('index', '<U1'), ('A', '<i8'), ('B', '<i8'), ('C', '<f8')])</code>

The above is the detailed content of How to efficiently convert a Pandas DataFrame with missing values into a NumPy array?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn