Home >Backend Development >Python Tutorial >How to Divide a DataFrame Based on Column Values in Pandas?
Pandas: Dividing a DataFrame Based on Column Values
When working with Pandas DataFrames, the need arises to split the data into subsets based on specific column values. One common scenario is splitting a DataFrame based on a threshold value. Here's how it can be achieved:
Creating Boolean Masks
The simplest method involves creating a boolean mask using comparison operators. By applying the mask to the DataFrame, you can create two DataFrames with data satisfying conditions set by the mask.
For example, to split a DataFrame by a column named 'Sales' with sales values less than and greater than or equal to a specified threshold 's':
<code class="python">import pandas as pd df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]}) print(df) s = 30 # Boolean mask for rows where Sales >= s sales_ge_mask = df['Sales'] >= s # DataFrame with Sales >= s df1 = df[sales_ge_mask] print(df1) # Boolean mask for rows where Sales < s sales_lt_mask = df['Sales'] < s # DataFrame with Sales < s df2 = df[sales_lt_mask] print(df2)
You can invert the mask using the "~" operator to split the DataFrame based on the negation of the condition.
<code class="python"># Boolean mask for rows where Sales < s sales_lt_mask = df['Sales'] < s # DataFrame with Sales >= s df1 = df[~sales_lt_mask] print(df1) # DataFrame with Sales < s df2 = df[sales_lt_mask] print(df2)</code>
This method efficiently creates subsets of DataFrames based on tailored conditions.
The above is the detailed content of How to Divide a DataFrame Based on Column Values in Pandas?. For more information, please follow other related articles on the PHP Chinese website!