Home > Article > Backend Development > How to Remove Duplicate Columns in Python DataFrames?
Removing Duplicate Columns in Python Dataframes
When working with a dataframe, duplicate columns can often arise, creating redundancies and potentially causing confusion. This can be particularly frustrating if you want to retain only unique columns. Fortunately, there are simple solutions to remove duplicate columns in Python pandas.
Solution for Removing Columns by Names
To remove duplicate columns based on their names, use the following line:
<code class="python">df = df.loc[:,~df.columns.duplicated()].copy()</code>
This approach uses the ~ operator to invert the boolean values returned by df.columns.duplicated(), which checks for duplicate column names. The resulting boolean array is then used to select only the non-duplicated columns in the df.loc indexing. The .copy() method is added to avoid potential errors in modifying the original dataframe later.
Solution for Removing Duplicates by Values
Suppose you want to remove duplicate columns by checking their values, not just their names. This can be achieved using the following code:
<code class="python">df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()</code>
This solution avoids transposing the dataframe, which can be time-consuming for large dataframes. It applies a lambda function to each column to check for duplicate values. The resulting boolean array is then used to select only the columns with no duplicate values.
Note: Be cautious when using the value-based approach. It may not always yield the desired results in certain cases.
Additional Tips
The above is the detailed content of How to Remove Duplicate Columns in Python DataFrames?. For more information, please follow other related articles on the PHP Chinese website!