Home >Backend Development >Python Tutorial >How to use scikit-learn machine learning library in Python.
scikit-learn is one of the most popular machine learning libraries in Python. It provides a variety of machine learning algorithms and tools, including classification, regression, clustering, dimensionality reduction, etc. .
The advantages of scikit-learn are:
Easy to use: The interface of scikit-learn is simple and easy to understand, allowing users to easily get started with machine learning. Unified API: The API of scikit-learn is very unified, and the methods of using various algorithms are basically the same, making learning and use more convenient.
Implements a large number of machine learning algorithms: scikit-learn implements various classic machine learning algorithms, and provides a wealth of tools and functions, making algorithm debugging and optimization more convenient. easy.
Open source and free: scikit-learn is completely open source and free, and anyone can use and modify its code.
Efficient and stable: scikit-learn implements various efficient machine learning algorithms, can handle large-scale data sets, and performs well in terms of stability and reliability. scikit-learn is very suitable for entry-level machine learning because the API is very unified and the model is relatively simple. My recommendation here is to study in conjunction with the official documentation, which not only introduces the scope of application of each model but also provides code samples.
The LinearRegression model is a model based on linear regression and is suitable for solving prediction problems of continuous variables. The basic idea of this model is to establish a linear equation, model the relationship between the independent variable and the dependent variable as a straight line, and use the training data to fit the straight line to find the coefficients of the linear equation, and then use this equation to test data for prediction.
LinearRegression model is suitable for problems where there is a linear relationship between independent variables and dependent variables, such as housing price prediction, sales prediction, user behavior prediction, etc. Of course, when the relationship between the independent variable and the dependent variable is nonlinear, the performance of the LinearRegression model will be poor. At this time, polynomial regression, ridge regression, Lasso regression and other methods can be used to solve the problem.
After putting aside the influence of other factors, there is a certain linear relationship between learning time and learning performance. Of course, the learning time here refers to the effective learning time, performance As the study time increases, the grades will also increase. So we prepare a data set of study time and grades. Part of the data in the data set is as follows:
Learning time, score
0.5,15
0.75,23
1.0,14
1.25,42
1.5,21
1.75,28
1.75,35
2.0,51
2.25,61
2.5,49
to determine the feature sum Goal
Between study time and grades, study time is the feature, which is the independent variable; grade is the label, which is the dependent variable, so we need to extract features and labels from the prepared study time and grade data set.
import pandas as pd import numpy as np from sklearn.metrics import r2_score, mean_squared_error from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 读取学习时间和成绩CSV数据文件 data = pd.read_csv('data/study_time_score.csv') # 提取数据特征学习时间 X = data['学习时间'] # 提取数据目标(标签)分数 Y = data['分数']
Divide the training set and the test set
After the feature and label data are prepared, use scikit-learn's LinearRegression for training and divide the data set into a training set and a test set.
""" 将特征数据和目标数据划分为测试集和训练集 通过test_size=0.25将百分之二十五的数据划分为测试集 """ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0) x_train = X_train.values.reshape(-1, 1) model.fit(x_train, Y_train)
Select the model and fit the data
After preparing the test set and training set, we can choose the appropriate model to fit the training set so that we can predict other The target corresponding to the feature
# 选择模型,选择模型为LinearRegression model = LinearRegression() # Scikit-learn中,机器学习模型的输入必须是一个二维数组。我们需要将一维数组转换为二维数组,才能在模型中使用。 x_train = X_train.values.reshape(-1, 1) # 进行拟合 model.fit(x_train, Y_train)
Get the model parameters
Since the data set only contains two learning time and grades, it is a very simple linear model, and the mathematical formula behind it is y=ax b , where the y dependent variable is grades, and the x independent variable is study time.
""" 输出模型关键参数 Intercept: 截距 即b Coefficients: 变量权重 即a """ print('Intercept:', model.intercept_) print('Coefficients:', model.coef_)
Backtest
The above fitting model only uses the test set data. Next, we need to use the test set data to conduct a backtest on the fitting of the model. After using the training set to fit, , we can predict the feature test set, and by comparing the obtained target prediction results with the actual target values, we can obtain the fitting degree of the model.
# 转换为n行1列的二维数组 x_test = X_test.values.reshape(-1, 1) # 在测试集上进行预测并计算评分 Y_pred = model.predict(x_test) # 打印测试特征数据 print(x_test) # 打印特征数据对应的预测结果 print(Y_pred) # 将预测结果与原特征数据对应的实际目标值进行比较,从而获得模型拟合度 # R2 (R-squared):模型拟合优度,取值范围在0~1之间,越接近1表示模型越好的拟合了数据。 print("R2:", r2_score(Y_test, Y_pred))
Program running results
According to the above code, we need to determine the fitting degree of the LinearRegression model, that is, whether the data is suitable or not. Use a linear model for fitting. The running results of the program are as follows:
##Prediction results:[47.43726068 33.05457106 49.83437561 63.41802692 41.84399249 37.84880093
23.46611131 37. 84880093 26.66226456 71.40841004 18.67188144 88.9872529
63.41802692 42.6430308 21.86803469 69.81033341 66.61418017 33.05457106
58.62379705 50.63341392 18.67188144 41.044954 0 .8935675710322939
The above is the detailed content of How to use scikit-learn machine learning library in Python.. For more information, please follow other related articles on the PHP Chinese website!