Home  >  Article  >  Backend Development  >  How to Extract Numbers from Non-Numeric Strings in Pandas?

How to Extract Numbers from Non-Numeric Strings in Pandas?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-24 14:15:02648browse

How to Extract Numbers from Non-Numeric Strings in Pandas?

Pandas: Extracting Numbers from Strings

When working with data frames in Pandas, it's often necessary to extract numeric information from cells that contain non-numeric characters. This can be challenging, but Pandas provides several methods to help you achieve this.

Using str.extract() for Number Extraction

One effective method for extracting numbers from strings is str.extract(). This method allows you to specify a regular expression pattern that defines the numeric data you want to capture.

Consider the following data frame:

<code class="python">import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
print(df)</code>

Output:

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

To extract the numbers from each cell, you can use the following regular expression:

<code class="python">df.A.str.extract('(\d+)')</code>

The regex pattern (d ) captures any sequence of one or more digits. The parentheses around the pattern create a capturing group, which is used to return the matched portion of the string.

Output:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object

As you can see, the desired numbers have been successfully extracted from each cell, even those that contained non-numeric characters. Note that this method will only work for whole numbers and not for floating-point numbers.

The above is the detailed content of How to Extract Numbers from Non-Numeric Strings in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn