Home >Backend Development >Python Tutorial >How to Extract Columns with Matching Substrings in pandas DataFrame Iteratively and Using Regular Expressions?
Identifying Columns Containing Specific Substrings
To locate columns whose names contain a specified substring without requiring an exact match, an iterative approach can be employed. This involves examining each column name and identifying those that satisfy the search criterion.
Consider a DataFrame with column names such as 'spike-2', 'hey spike', and 'spiked-in'. To extract the column names containing the substring 'spike', the following Python code can be utilized:
<code class="python">import pandas as pd # Initialize data data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]} df = pd.DataFrame(data) # Iterate over column names and filter based on substring spike_cols = [col for col in df.columns if 'spike' in col] # Print resulting column names print(spike_cols)</code>
In this code:
Alternatively, to obtain a DataFrame with only the matching columns:
<code class="python">df2 = df.filter(regex='spike')</code>
This will create df2 containing only the columns whose names include 'spike'.
The above is the detailed content of How to Extract Columns with Matching Substrings in pandas DataFrame Iteratively and Using Regular Expressions?. For more information, please follow other related articles on the PHP Chinese website!