Home >Backend Development >Python Tutorial >How Can I Efficiently Select Rows in a Pandas DataFrame Based on Column Values?
Like any relational database, you may need to select certain rows from a DataFrame based on the values in a particular column. To achieve this seamlessly in Pandas, there are several methods at your disposal.
To retrieve rows whose column values match a specific value, leverage the == operator:
df.loc[df['column_name'] == some_value]
Conversely, if you wish to select rows where the column values belong to a collection of values, employ isin:
df.loc[df['column_name'].isin(some_values)]
To combine multiple conditions in your selection, connect them with &:
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
Note: Parentheses are crucial here to ensure proper evaluation.
To exclude rows with specific column values, utilize !=:
df.loc[df['column_name'] != some_value]
Alternatively, for values outside a certain range, negate the isin result using ~:
df = df.loc[~df['column_name'].isin(some_values)] # .loc is not in-place replacement
Consider the following DataFrame:
import pandas as pd import numpy as np df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(), 'B': 'one one two three two two one three'.split(), 'C': np.arange(8), 'D': np.arange(8) * 2}) print(df)
Selecting rows with 'A' value 'foo':
print(df.loc[df['A'] == 'foo'])
Selecting rows with 'B' values 'one' or 'three':
print(df.loc[df['B'].isin(['one','three'])])
For frequent filtering operations, it's more efficient to create an index first:
df = df.set_index(['B']) print(df.loc['one'])
Alternatively, use df.index.isin:
df.loc[df.index.isin(['one','two'])]
The above is the detailed content of How Can I Efficiently Select Rows in a Pandas DataFrame Based on Column Values?. For more information, please follow other related articles on the PHP Chinese website!