Home >Backend Development >Python Tutorial >How to Efficiently Select Data from a Pandas DataFrame with Multiple Complex Criteria?

How to Efficiently Select Data from a Pandas DataFrame with Multiple Complex Criteria?

DDD
DDDOriginal
2024-12-02 20:45:14575browse

How to Efficiently Select Data from a Pandas DataFrame with Multiple Complex Criteria?

Selecting with Complex Criteria from Pandas.DataFrame

Complex selection criteria with multiple conditions often arise in data manipulation tasks. Pandas provides methods and idioms to efficiently handle these scenarios.

Example:

Consider the following DataFrame:

import pandas as pd
from random import randint

df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
                   'B': [randint(1, 9)*10 for x in range(10)],
                   'C': [randint(1, 9)*100 for x in range(10)]})

Problem:

Select values from 'A' where the corresponding values for 'B' are greater than 50 and 'C' does not equal 900.

Solution:

We can apply column operations to obtain boolean Series objects for each condition:

df["B"] > 50
(df["B"] > 50) & (df["C"] != 900)

or equivalently:

(df["B"] > 50) & ~(df["C"] == 900)

These boolean Series can then be used to index into the DataFrame:

df["A"][(df["B"] > 50) & (df["C"] != 900)]

or, using .loc:

df.loc[(df["B"] > 50) & (df["C"] != 900), "A"]

Note:

For write access, it is recommended to use .loc instead of chaining indices, as the latter can result in a view instead of a copy, leading to potential issues.

The above is the detailed content of How to Efficiently Select Data from a Pandas DataFrame with Multiple Complex Criteria?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn