Home  >  Article  >  Backend Development  >  How to use the scikit-learn module for machine learning in Python 3.x

How to use the scikit-learn module for machine learning in Python 3.x

WBOY
WBOYOriginal
2023-07-30 09:37:492179browse

How to use the scikit-learn module for machine learning in Python 3.x

Introduction:
Machine learning is a branch of artificial intelligence that allows computers to improve their performance by learning and training data performance. Among them, scikit-learn is a powerful Python machine learning library that provides many commonly used machine learning algorithms and tools to help developers quickly build and deploy machine learning models. This article will introduce how to use the scikit-learn module in Python 3.x for machine learning, with code examples.

1. Install the scikit-learn module
To use the scikit-learn module, you first need to install it. You can use the pip tool to complete the installation. Just enter the following command in the command line:
pip install scikit-learn

2. Import the scikit-learn module
After the installation is complete, you can install it in the Python script Import the scikit-learn module in order to use its functionality. The imported code is as follows:
import sklearn

3. Load the data set
In machine learning, it is usually necessary to load the data set first, and then process and analyze it. scikit-learn provides some built-in datasets that can be used to practice and test algorithms. The following code demonstrates how to load Iris, a data set built into scikit-learn:
from sklearn.datasets import load_iris

Load the Iris data set

iris = load_iris()

4. Data preprocessing
In machine learning, data preprocessing is an important step. It includes data cleaning, feature selection, data normalization and other operations to ensure the quality and accuracy of data. The following code snippet shows how to normalize a dataset:
from sklearn.preprocessing import MinMaxScaler

Create a MinMaxScaler object

scaler = MinMaxScaler()

Normalize the data set

normalized_data = scaler.fit_transform(iris.data)

5. Split the data set
In machine learning, it is usually necessary to divide the data set into It is a training set and a test set to be used when training the model and evaluating the model performance. The following code shows how to split the data set into a training set and a test set:
from sklearn.model_selection import train_test_split

Split the data set into a training set and a test set

X_train, X_test , y_train, y_test = train_test_split(normalized_data, iris.target, test_size=0.2)

6. Training model
scikit-learn provides many machine learning algorithms, and you can choose the appropriate one according to the characteristics and goals of the data. Algorithms are trained. The following code shows an example of training a model using the logistic regression algorithm:
from sklearn.linear_model import LogisticRegression

Create a logistic regression model object

model = LogisticRegression()

Use the training set to train the model

model.fit(X_train, y_train)

7. Evaluate the model performance
After the training is completed, the performance of the model needs to be evaluated. scikit-learn provides a variety of evaluation indicators that can help us judge the accuracy and stability of the model. The following code shows how to use accuracy to evaluate the performance of the model:
from sklearn.metrics import accuracy_score

Use the test set to make predictions

y_pred = model.predict(X_test)

Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

8. Model tuning
According to the evaluation results, we can tune the model to improve the model performance. scikit-learn provides parameter tuning functions, which can find the best model parameters through grid search and other methods. The following code shows how to use grid search to tune model parameters:
from sklearn.model_selection import GridSearchCV

Define parameter grid

param_grid = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2']}

Create GridSearchCV object

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv= 5)

Use the training set for grid search

grid_search.fit(X_train, y_train)

Get the best model parameters

best_params = grid_search. best_params_

9. Use the model for prediction
After completing the training and tuning of the model, you can use the model for prediction. The following code shows how to use the trained model to predict new data:

Create a new model object using the best model parameters

best_model = LogisticRegression(**best_params)

Use the entire data set for model training

best_model.fit(normalized_data, iris.target)

Prepare new data

new_data = [[5.1, 3.5, 1.4, 0.2], [6.7, 3.1, 4.4, 1.4], [6.5, 3.0, 5.2, 2.0]]

Predict new data

predictions = best_model.predict(new_data)

Conclusion:
This article introduces how to use the scikit-learn module in Python 3.x for machine learning. By installing modules, importing modules, loading datasets, data preprocessing, splitting datasets, training models, evaluating model performance, model tuning, and using models for prediction, readers can learn how to apply scikit-learn modules to build and deploy Machine learning model. Through practice and continuous learning, we can further delve into the field of machine learning and achieve better results in practical applications.

The above is the detailed content of How to use the scikit-learn module for machine learning in Python 3.x. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn