Home >Backend Development >Python Tutorial >How Can I Efficiently Perform SQL's IN and NOT IN Filtering in Pandas?

How Can I Efficiently Perform SQL's IN and NOT IN Filtering in Pandas?

Susan Sarandon
Susan SarandonOriginal
2024-12-20 20:05:17311browse

How Can I Efficiently Perform SQL's IN and NOT IN Filtering in Pandas?

Object-Oriented SQL-Like Filtering in Pandas

Filtering Pandas dataframes is a crucial task in data analysis, and understanding how to effectively utilize SQL-like operators can greatly simplify this process. Specifically, the IN and NOT IN operators are widely used for selecting rows based on membership criteria.

Pandas Equivalent of SQL's IN and NOT IN

To achieve the equivalent of SQL's IN operator in Pandas, you can leverage the isin() method on a Series. This method checks whether each element in the Series is contained within a specified list or set. For example, to filter a dataframe based on a list of countries to include:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
countries_to_keep = ['UK', 'China']
df[df['country'].isin(countries_to_keep)]

This will return a dataframe containing only the rows where the country column values are present in the countries_to_keep list.

To perform the equivalent of SQL's NOT IN, simply negate the isin() method using the ~ operator:

df[~df['country'].isin(countries_to_keep)]

This will produce a dataframe excluding the rows where the country column values match the specified list.

Example

Consider the following dataframe:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
countries_to_keep = ['UK', 'China']

To select only rows with countries in countries_to_keep:

df_filtered_in = df[df['country'].isin(countries_to_keep)]

This will result in:

    country
1        UK
3     China

To select rows with countries not in countries_to_keep:

df_filtered_not_in = df[~df['country'].isin(countries_to_keep)]

This will produce:

    country
0        US
2   Germany

Conclusion

The isin() method in Pandas provides a convenient and efficient way to implement IN and NOT IN-style filtering, offering a powerful tool for manipulating dataframes based on membership criteria.

The above is the detailed content of How Can I Efficiently Perform SQL's IN and NOT IN Filtering in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn