Home > Article > Backend Development > What are the operations methods of Python drop() to delete rows and columns?
The drop() function can come in handy when performing feature engineering and dividing data sets. It can easily eliminate data, operation columns, operation rows, etc.
The detailed syntax of drop() is as follows:
Deleting rows is index, deleting columns is columns:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, inplace=False)
Parameters:
labels: to be deleted Label for a row or column, either a single label or a list of labels.
axis: The axis of the row or column to be deleted, 0 means row, 1 means column.
index: The index of the row to be deleted, which can be a single index or a list of indexes.
columns: The column name of the column to be deleted, which can be a single column name or a list of column names.
inplace: Whether to operate on the original DataFrame. The default is False, which means the operation will not be performed on the original DataFrame.
Usage scenario 1: Delete unnecessary features.
For example: if some features have little impact on the results, you can delete the independent variables that are not related to the dependent variable; in order to avoid multicollinearity, you should delete the independent variables that have a strong correlation.
df = data.drop(data[['RowNumber','CustomerId','Surname']],axis=1) df
Code explanation:
data is the data set, the two square brackets represent the DataFrame format, which filters out 3 fields to be deleted;
axis=1 represents the operation Column;
Running results:
Usage scenario 2: Delete the dependent variable
# 自变量、因变量 x_data = df.drop(['Exited'],axis=1) y_data = df['Exited'] x_data
Code explanation:
## Fill in the field to be deleted in the #drop() function, which means to delete the column named "Exited" from df; ['Exited'] This field is the dependent variable we want to remove, a single field can This means; Running results: Delete rowsUsage scenario 3: When dividing the data set, a training set is generated , remove the samples assigned to the training set, and the rest is the test set.#划分训练集 train_data = data.sample(frac = 0.8, random_state = 0) #测试集 test_data = data.drop(train_data.index)Code explanation: Fill in the row index in the drop() function to delete the row; train_data is the training set we have divided, train_data.index represents the row index ;axis=0, which means deleting rows, or not writing it, is the default value;
The above is the detailed content of What are the operations methods of Python drop() to delete rows and columns?. For more information, please follow other related articles on the PHP Chinese website!