search
HomeBackend DevelopmentPython TutorialHow to use scikit-learn machine learning library in Python.

How to use scikit-learn machine learning library in Python.

Apr 22, 2023 pm 10:31 PM
pythonscikit-learn

Preface

scikit-learn is one of the most popular machine learning libraries in Python. It provides a variety of machine learning algorithms and tools, including classification, regression, clustering, dimensionality reduction, etc. .

The advantages of scikit-learn are:

  • Easy to use: The interface of scikit-learn is simple and easy to understand, allowing users to easily get started with machine learning. Unified API: The API of scikit-learn is very unified, and the methods of using various algorithms are basically the same, making learning and use more convenient.

  • Implements a large number of machine learning algorithms: scikit-learn implements various classic machine learning algorithms, and provides a wealth of tools and functions, making algorithm debugging and optimization more convenient. easy.

  • Open source and free: scikit-learn is completely open source and free, and anyone can use and modify its code.

  • Efficient and stable: scikit-learn implements various efficient machine learning algorithms, can handle large-scale data sets, and performs well in terms of stability and reliability. scikit-learn is very suitable for entry-level machine learning because the API is very unified and the model is relatively simple. My recommendation here is to study in conjunction with the official documentation, which not only introduces the scope of application of each model but also provides code samples.

Linear Regression Model-LinearRegression

The LinearRegression model is a model based on linear regression and is suitable for solving prediction problems of continuous variables. The basic idea of ​​this model is to establish a linear equation, model the relationship between the independent variable and the dependent variable as a straight line, and use the training data to fit the straight line to find the coefficients of the linear equation, and then use this equation to test data for prediction.

LinearRegression model is suitable for problems where there is a linear relationship between independent variables and dependent variables, such as housing price prediction, sales prediction, user behavior prediction, etc. Of course, when the relationship between the independent variable and the dependent variable is nonlinear, the performance of the LinearRegression model will be poor. At this time, polynomial regression, ridge regression, Lasso regression and other methods can be used to solve the problem.

Prepare the data set

After putting aside the influence of other factors, there is a certain linear relationship between learning time and learning performance. Of course, the learning time here refers to the effective learning time, performance As the study time increases, the grades will also increase. So we prepare a data set of study time and grades. Part of the data in the data set is as follows:

Learning time, score
0.5,15
0.75,23
1.0,14
1.25,42
1.5,21
1.75,28
1.75,35
2.0,51
2.25,61
2.5,49

Use LinearRegression

to determine the feature sum Goal

Between study time and grades, study time is the feature, which is the independent variable; grade is the label, which is the dependent variable, so we need to extract features and labels from the prepared study time and grade data set.

import pandas as pd
import numpy as np
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# 读取学习时间和成绩CSV数据文件
data = pd.read_csv('data/study_time_score.csv')
# 提取数据特征学习时间
X = data['学习时间']
# 提取数据目标(标签)分数
Y = data['分数']

Divide the training set and the test set

After the feature and label data are prepared, use scikit-learn's LinearRegression for training and divide the data set into a training set and a test set.

"""
将特征数据和目标数据划分为测试集和训练集
通过test_size=0.25将百分之二十五的数据划分为测试集
"""
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)
x_train = X_train.values.reshape(-1, 1)
model.fit(x_train, Y_train)

Select the model and fit the data

After preparing the test set and training set, we can choose the appropriate model to fit the training set so that we can predict other The target corresponding to the feature

# 选择模型,选择模型为LinearRegression
model = LinearRegression()
# Scikit-learn中,机器学习模型的输入必须是一个二维数组。我们需要将一维数组转换为二维数组,才能在模型中使用。
x_train = X_train.values.reshape(-1, 1)
# 进行拟合
model.fit(x_train, Y_train)

Get the model parameters

Since the data set only contains two learning time and grades, it is a very simple linear model, and the mathematical formula behind it is y=ax b , where the y dependent variable is grades, and the x independent variable is study time.

"""
输出模型关键参数
Intercept: 截距 即b
Coefficients: 变量权重 即a
"""
print('Intercept:', model.intercept_)
print('Coefficients:', model.coef_)

Backtest

The above fitting model only uses the test set data. Next, we need to use the test set data to conduct a backtest on the fitting of the model. After using the training set to fit, , we can predict the feature test set, and by comparing the obtained target prediction results with the actual target values, we can obtain the fitting degree of the model.

# 转换为n行1列的二维数组
x_test = X_test.values.reshape(-1, 1)
# 在测试集上进行预测并计算评分
Y_pred = model.predict(x_test)
# 打印测试特征数据
print(x_test)
# 打印特征数据对应的预测结果
print(Y_pred)
# 将预测结果与原特征数据对应的实际目标值进行比较,从而获得模型拟合度
# R2 (R-squared):模型拟合优度,取值范围在0~1之间,越接近1表示模型越好的拟合了数据。
print("R2:", r2_score(Y_test, Y_pred))
  • Program running results

  • According to the above code, we need to determine the fitting degree of the LinearRegression model, that is, whether the data is suitable or not. Use a linear model for fitting. The running results of the program are as follows:

##Prediction results:

[47.43726068 33.05457106 49.83437561 63.41802692 41.84399249 37.84880093
23.46611131 37. 84880093 26.66226456 71.40841004 18.67188144 88.9872529
63.41802692 42.6430308 21.86803469 69.81033341 66.61418017 33.05457106
58.62379705 50.63341392 18.67188144 41.044954 0 .8935675710322939

The above is the detailed content of How to use scikit-learn machine learning library in Python.. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
Learning Python: Is 2 Hours of Daily Study Sufficient?Learning Python: Is 2 Hours of Daily Study Sufficient?Apr 18, 2025 am 12:22 AM

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Python for Web Development: Key ApplicationsPython for Web Development: Key ApplicationsApr 18, 2025 am 12:20 AM

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code

Python vs. C  : Exploring Performance and EfficiencyPython vs. C : Exploring Performance and EfficiencyApr 18, 2025 am 12:20 AM

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Python in Action: Real-World ExamplesPython in Action: Real-World ExamplesApr 18, 2025 am 12:18 AM

Python's real-world applications include data analytics, web development, artificial intelligence and automation. 1) In data analysis, Python uses Pandas and Matplotlib to process and visualize data. 2) In web development, Django and Flask frameworks simplify the creation of web applications. 3) In the field of artificial intelligence, TensorFlow and PyTorch are used to build and train models. 4) In terms of automation, Python scripts can be used for tasks such as copying files.

Python's Main Uses: A Comprehensive OverviewPython's Main Uses: A Comprehensive OverviewApr 18, 2025 am 12:18 AM

Python is widely used in data science, web development and automation scripting fields. 1) In data science, Python simplifies data processing and analysis through libraries such as NumPy and Pandas. 2) In web development, the Django and Flask frameworks enable developers to quickly build applications. 3) In automated scripts, Python's simplicity and standard library make it ideal.

The Main Purpose of Python: Flexibility and Ease of UseThe Main Purpose of Python: Flexibility and Ease of UseApr 17, 2025 am 12:14 AM

Python's flexibility is reflected in multi-paradigm support and dynamic type systems, while ease of use comes from a simple syntax and rich standard library. 1. Flexibility: Supports object-oriented, functional and procedural programming, and dynamic type systems improve development efficiency. 2. Ease of use: The grammar is close to natural language, the standard library covers a wide range of functions, and simplifies the development process.

Python: The Power of Versatile ProgrammingPython: The Power of Versatile ProgrammingApr 17, 2025 am 12:09 AM

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.

Learning Python in 2 Hours a Day: A Practical GuideLearning Python in 2 Hours a Day: A Practical GuideApr 17, 2025 am 12:05 AM

Yes, learn Python in two hours a day. 1. Develop a reasonable study plan, 2. Select the right learning resources, 3. Consolidate the knowledge learned through practice. These steps can help you master Python in a short time.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version