How to use scikit-learn machine learning library in Python.
Preface
scikit-learn is one of the most popular machine learning libraries in Python. It provides a variety of machine learning algorithms and tools, including classification, regression, clustering, dimensionality reduction, etc. .
The advantages of scikit-learn are:
Easy to use: The interface of scikit-learn is simple and easy to understand, allowing users to easily get started with machine learning. Unified API: The API of scikit-learn is very unified, and the methods of using various algorithms are basically the same, making learning and use more convenient.
Implements a large number of machine learning algorithms: scikit-learn implements various classic machine learning algorithms, and provides a wealth of tools and functions, making algorithm debugging and optimization more convenient. easy.
Open source and free: scikit-learn is completely open source and free, and anyone can use and modify its code.
Efficient and stable: scikit-learn implements various efficient machine learning algorithms, can handle large-scale data sets, and performs well in terms of stability and reliability. scikit-learn is very suitable for entry-level machine learning because the API is very unified and the model is relatively simple. My recommendation here is to study in conjunction with the official documentation, which not only introduces the scope of application of each model but also provides code samples.
Linear Regression Model-LinearRegression
The LinearRegression model is a model based on linear regression and is suitable for solving prediction problems of continuous variables. The basic idea of this model is to establish a linear equation, model the relationship between the independent variable and the dependent variable as a straight line, and use the training data to fit the straight line to find the coefficients of the linear equation, and then use this equation to test data for prediction.
LinearRegression model is suitable for problems where there is a linear relationship between independent variables and dependent variables, such as housing price prediction, sales prediction, user behavior prediction, etc. Of course, when the relationship between the independent variable and the dependent variable is nonlinear, the performance of the LinearRegression model will be poor. At this time, polynomial regression, ridge regression, Lasso regression and other methods can be used to solve the problem.
Prepare the data set
After putting aside the influence of other factors, there is a certain linear relationship between learning time and learning performance. Of course, the learning time here refers to the effective learning time, performance As the study time increases, the grades will also increase. So we prepare a data set of study time and grades. Part of the data in the data set is as follows:
Learning time, score
0.5,15
0.75,23
1.0,14
1.25,42
1.5,21
1.75,28
1.75,35
2.0,51
2.25,61
2.5,49
Use LinearRegression
to determine the feature sum Goal
Between study time and grades, study time is the feature, which is the independent variable; grade is the label, which is the dependent variable, so we need to extract features and labels from the prepared study time and grade data set.
import pandas as pd import numpy as np from sklearn.metrics import r2_score, mean_squared_error from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 读取学习时间和成绩CSV数据文件 data = pd.read_csv('data/study_time_score.csv') # 提取数据特征学习时间 X = data['学习时间'] # 提取数据目标(标签)分数 Y = data['分数']
Divide the training set and the test set
After the feature and label data are prepared, use scikit-learn's LinearRegression for training and divide the data set into a training set and a test set.
""" 将特征数据和目标数据划分为测试集和训练集 通过test_size=0.25将百分之二十五的数据划分为测试集 """ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0) x_train = X_train.values.reshape(-1, 1) model.fit(x_train, Y_train)
Select the model and fit the data
After preparing the test set and training set, we can choose the appropriate model to fit the training set so that we can predict other The target corresponding to the feature
# 选择模型,选择模型为LinearRegression model = LinearRegression() # Scikit-learn中,机器学习模型的输入必须是一个二维数组。我们需要将一维数组转换为二维数组,才能在模型中使用。 x_train = X_train.values.reshape(-1, 1) # 进行拟合 model.fit(x_train, Y_train)
Get the model parameters
Since the data set only contains two learning time and grades, it is a very simple linear model, and the mathematical formula behind it is y=ax b , where the y dependent variable is grades, and the x independent variable is study time.
""" 输出模型关键参数 Intercept: 截距 即b Coefficients: 变量权重 即a """ print('Intercept:', model.intercept_) print('Coefficients:', model.coef_)
Backtest
The above fitting model only uses the test set data. Next, we need to use the test set data to conduct a backtest on the fitting of the model. After using the training set to fit, , we can predict the feature test set, and by comparing the obtained target prediction results with the actual target values, we can obtain the fitting degree of the model.
# 转换为n行1列的二维数组 x_test = X_test.values.reshape(-1, 1) # 在测试集上进行预测并计算评分 Y_pred = model.predict(x_test) # 打印测试特征数据 print(x_test) # 打印特征数据对应的预测结果 print(Y_pred) # 将预测结果与原特征数据对应的实际目标值进行比较,从而获得模型拟合度 # R2 (R-squared):模型拟合优度,取值范围在0~1之间,越接近1表示模型越好的拟合了数据。 print("R2:", r2_score(Y_test, Y_pred))
Program running results
According to the above code, we need to determine the fitting degree of the LinearRegression model, that is, whether the data is suitable or not. Use a linear model for fitting. The running results of the program are as follows:
##Prediction results:[47.43726068 33.05457106 49.83437561 63.41802692 41.84399249 37.84880093
23.46611131 37. 84880093 26.66226456 71.40841004 18.67188144 88.9872529
63.41802692 42.6430308 21.86803469 69.81033341 66.61418017 33.05457106
58.62379705 50.63341392 18.67188144 41.044954 0 .8935675710322939
The above is the detailed content of How to use scikit-learn machine learning library in Python.. For more information, please follow other related articles on the PHP Chinese website!

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Python's real-world applications include data analytics, web development, artificial intelligence and automation. 1) In data analysis, Python uses Pandas and Matplotlib to process and visualize data. 2) In web development, Django and Flask frameworks simplify the creation of web applications. 3) In the field of artificial intelligence, TensorFlow and PyTorch are used to build and train models. 4) In terms of automation, Python scripts can be used for tasks such as copying files.

Python is widely used in data science, web development and automation scripting fields. 1) In data science, Python simplifies data processing and analysis through libraries such as NumPy and Pandas. 2) In web development, the Django and Flask frameworks enable developers to quickly build applications. 3) In automated scripts, Python's simplicity and standard library make it ideal.

Python's flexibility is reflected in multi-paradigm support and dynamic type systems, while ease of use comes from a simple syntax and rich standard library. 1. Flexibility: Supports object-oriented, functional and procedural programming, and dynamic type systems improve development efficiency. 2. Ease of use: The grammar is close to natural language, the standard library covers a wide range of functions, and simplifies the development process.

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.

Yes, learn Python in two hours a day. 1. Develop a reasonable study plan, 2. Select the right learning resources, 3. Consolidate the knowledge learned through practice. These steps can help you master Python in a short time.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1
Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 Linux new version
SublimeText3 Linux latest version