Home >Backend Development >Python Tutorial >How to Efficiently Select Data from a Pandas DataFrame Based on Column Values?

How to Efficiently Select Data from a Pandas DataFrame Based on Column Values?

Linda Hamilton
Linda HamiltonOriginal
2024-12-24 01:24:11457browse

How to Efficiently Select Data from a Pandas DataFrame Based on Column Values?

How to Select Data from a DataFrame Based on Column Values

In SQL, a typical query for selecting rows based on column values would look like:

SELECT *
FROM table
WHERE column_name = some_value

To achieve the same result in Pandas, there are several approaches:

Exact Value Matching

To select rows where the column value equals a specific value (some_value), use the == operator within .loc:

df.loc[df['column_name'] == some_value]

Value Inclusion and Exclusion

To select rows where the column value is contained in a list (some_values), use the isin function:

df.loc[df['column_name'].isin(some_values)]

To exclude specific values, negate the boolean Series returned by isin:

df = df.loc[~df['column_name'].isin(some_values)] # Note: This is not an in-place operation

Combining Conditions

Multiple conditions can be combined using logical operators like & (AND) and | (OR):

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note that parentheses are necessary to ensure correct operator precedence.

Example

Consider the DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})

To select rows where 'A' equals 'foo':

print(df.loc[df['A'] == 'foo'])

Yields:

     A      B  C  D
0  foo    one  0  0
2  foo    two  2  4
4  foo    two  4  8
6  foo    one  6  12
7  foo  three  7  14

Optimization for Multiple Value Selection

For selecting rows based on multiple values, it's more efficient to create an index and use .loc with df.index.isin. This avoids multiple calls to isin, resulting in improved performance.

df = df.set_index(['B'])
print(df.loc[df.index.isin(['one','two'])])

Yields:

       A  C  D
B
one  foo  0  0
one  bar  1  2
one  foo  6  12
two  foo  2  4
two  foo  4  8
two  bar  5  10

The above is the detailed content of How to Efficiently Select Data from a Pandas DataFrame Based on Column Values?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn