Home  >  Article  >  Technology peripherals  >  What are the common algorithms for supervised learning? How are they applied?

What are the common algorithms for supervised learning? How are they applied?

王林
王林forward
2023-04-10 08:21:144951browse

What is supervised learning?

Supervised learning is a subset of machine learning. Supervised learning labels the input data of the machine learning model and exercises it. Therefore, the supervised model can predict the output of the model to the maximum extent.

The concept behind supervised learning can also be found in real life, such as teachers tutoring children. Suppose the teacher wants to teach children to recognize images of cats and dogs. S/he will tutor the child by continuously showing the child an image of a cat or a dog while informing the child whether the image is a dog or a cat.

The process of displaying and informing images can be thought of as labeling data. During the training process of the machine learning model, you will be told which data belongs to which category.

What is the use of supervised learning? Supervised learning can be used for both regression and classification problems. Classification models allow algorithms to determine which group given data belongs to. Examples might include True/False, Dog/Cat, etc.

Because regression models can predict future values ​​based on historical data, they can be used to predict employee wages or real estate sales prices.

In this article, we will list some common algorithms used for supervised learning, as well as practical tutorials on such algorithms.

Linear Regression

Linear regression is a supervised learning algorithm that predicts an output value based on a given input value. Linear regression is used when the target (output) variable returns a continuous value.

There are two main types of linear algorithms, simple linear regression and multiple linear regression.

Simple linear regression uses only one independent (input) variable. An example is predicting a child's age given a height.

On the other hand, multiple linear regression can use multiple independent variables to predict its final outcome. An example is predicting the price of a given property based on its location, size, demand, etc.

The following is the linear regression formula

What are the common algorithms for supervised learning? How are they applied?

For the Python example, we will use linear regression to predict the y value relative to a given x value.

The data set we are given contains only two columns: x and y. Note that the y result will return continuous values.

The following is a screenshot of the given data set:

What are the common algorithms for supervised learning? How are they applied?

Example of linear regression model using Python

1.Import the necessary libraries

import numpy as np <br>import pandas as pd <br>import matplotlib.pyplot as plt <br>import seaborn as sns from sklearn <br>import linear_model from sklearn.model_selection <br>import train_test_split import os

2. Reading and sampling our data set

To simplify the data set, we sampled 50 A sample of the data rows and rounds the data values ​​to 2 significant figures.

#Please note that you should import the given dataset before completing this step.

df = pd.read_csv("../input/random-linear-regression/train.csv") <br>df=df.sample(50) df=round(df,2)

3. Filter Null and Infinite values

If the data set contains null values and infinite values, an error may occur. Therefore, we will use the clean_dataset function to clean the dataset of these values.

def clean_dataset(df): <br>assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame" <br>df.dropna(inplace=True) <br>indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1) <br>return df[indices_to_keep].astype(np.float64)<br>df=clean_dataset(df)

4. Choose our values ​​of dependence and independence

Please note that we Convert the data to DataFrame format. The dataframe data type is a two-dimensional structure that aligns our data into rows and columns.

5. Split the data set

We divide the data set into training and Test part. The test data set size was chosen to be 20% of the total data set.

Please note that by setting random_state=1, the same data split will occur every time the model is run, resulting in the exact same training and test data set.

#This is useful in situations where you want to further tune the model.

x_train,  x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

6.建立线性回模型

使用导入的线性回归模型,我们可以在模型中自由使用线性回归算法,绕过我们为给定模型获得的 x 和 y 训练变量。

lm=linear_model.LinearRegression() lm.fit(x_train,y_train)

7. 以分散的方式绘制我们的数据

df.plot(kind="scatter", x="x", y="y")

8. 制我线性回归线

plt.plot(X,lm.predict(X), color="red")

What are the common algorithms for supervised learning? How are they applied?

蓝点表示数据点,而红线是模型绘制的最佳拟合线性回归线。线性模型算法总是会尝试绘制最佳拟合线以尽可能准确地预测结果。

逻辑回归

与线性回归类似,逻辑回归根据输入变量预测输出值,两种算法的主要区别在于逻辑回归算法的输出是分类(离散)变量。

对于 Python示例,使用逻辑回归将”分成两个不同的类别/种类。给定的数据集中会包括不同花的多个特征。

模型的目的是将给花识别为Iris-setosa、Iris-versicolor或 Iris-virginica 几个种类

下面是给定数据集的截图

What are the common algorithms for supervised learning? How are they applied?

使用 Python 的逻辑回归模型示例

1.导入必要的库

import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split import warnings warnings.filterwarnings('ignore')

2. 导入数据集

data = pd.read_csv('../input/iris-dataset-logistic-regression/iris.csv')

3. 选择我们依赖和独立的价值观

对于独立 value(x) ,将包括除类型列之外的所有可用列。至于我们的可靠值(y),将只包括类型列。

X = data[['x0','x1','x2','x3','x4']] <br>y = data[['type']]

4. 拆分数据集

将数据集分成两部分,80% 用于训练数据集,20% 用于测试数据集。

X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.2, random_state=1)

5. 运行逻辑模型

从 linear_model 库中导入整个逻辑回归算法。然后我们可以将 X 和 y 训练数据拟合到逻辑模型中。

from sklearn.linear_model import LogisticRegression <br>model = LogisticRegression(random_state = 0) <br>model.fit(X_train, y_train)

6. 评估我们模型的性能

print(lm.score(x_test, y_test))

返回值为0.9845128775509371,这表明我们模型的高性能。

请注意,随着测试分数的增加,模型的性能也会增加。

7.

import matplotlib.pyplot as plt %matplotlib inline <br>plt.plot(range(len(X_test)), pred,'o',c='r')

输出图:

What are the common algorithms for supervised learning? How are they applied?

在逻辑图中,红点表示给定的数据点。这些点清楚地分为 3 类,Virginica、versicolor 和 setosa 花种。

使用这种技术,逻辑回归模型可以根据花在图表上的位置轻松对花类型进行分类。

支持向量机

支持向量机( SVM) 算法是另一个著名的监督机器学习模型,由 Vladimir Vapnik 创建,它能够解决分类和回归问题。实际上它更多地被用到解决分类问题。

SVM 算法能够将给定的数据点分成不同的组。算法绘制数据之后,可以绘制最合适的线将数据分成多个类别,从而分析数据之间的关系

如下图所示,绘制的线将数据集完美地分成 2 个不同的组,蓝色和绿色。

What are the common algorithms for supervised learning? How are they applied?

SVM 模型可以根据图形的维度绘制直线或超平面。行只能用于二维数据集,这意味着只有 2 列的数据集。

如果是多个特征来预测数据集,就需要更高的维度。在数据集超过 2 维的情况下,支持向量机模型将绘制超平面。

在支持向量机 Python 的示例中,将对 3 种不同的花卉类型进行物种分类。我们的自变量包括花的所有特征,而因变量是花所属物种。

花卉品种包括Iris-setosa Iris-versicolorIris-virginica

下面是数据集的截图:

What are the common algorithms for supervised learning? How are they applied?

使用 Python 的支持向量机模型示例

1.入必要的

import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split from sklearn.datasets <br>import load_iris

2. 定的数据集

请注意,在执行此步骤之前,应该导入数据集。

data = pd.read_csv(‘../input/iris-flower-dataset/IRIS.csv’)

3. 将数据列拆分量和自

将 X 值作为自变量,其中包含除物种列之外的所有列。

变量y仅包含模型预测的物种列。

X = data.drop(‘species’, axis=1) y = data[‘species’]

4. 将数据集拆分为训练测试数据集

将数据集分为两部分,其中我们将 80% 的数据放入训练数据集中,将 20% 放入测试数据集中。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

5.入SVM并运行模型

导入了支持向量机算法。然后,使用上面步骤中收到的 X 和 y 训练数据集运行它。

from sklearn.svm import SVC <br>model = SVC( ) <br>model.fit(X_train, y_train)

6. 测试模型的性能

model.score(X_test, y_test)

为了评估模型的性能,将使用 score 函数。在第四步中创建的 X 和 y 测试值输入到 score 方法中。

返回值为0.9666666666667,这表明模型的高性能。

请注意,随着测试分数的增加,模型的性能也会增加。

Other popular supervised machine learning algorithms

Although linear, logistic and SVM algorithms are very reliable, they stillwill Mention some supervised machine learning algorithms.

1. Decision Tree

What are the common algorithms for supervised learning? How are they applied?

##​

Decision Tree Algorithm is a supervised machine learning model that uses a tree structure to make decisions. Decision trees are often used in classification problems where a model can decide which group a given item in a data set belongs to.

#Please note that the tree format used is an inverted tree format.

2. Random Forest

What are the common algorithms for supervised learning? How are they applied?

is considered a more complex algorithm, The random forest algorithm achieves its ultimate goal by building a large number of decision trees.

# means building multiple decision trees simultaneously, each returning its own results, which are then combined to get a better result.

#For classification problems, the random forest model will generate multiple decision trees and classify a given object based on the classification group predicted by the majority of the trees.

The model can fix overfitting caused by a single treeProblem. Also, the random forest algorithm can also be used for regression, although it may lead to undesirable results. 3. k-recent

neighbor

#kRecent

What are the common algorithms for supervised learning? How are they applied?

Neighbor(KNN) algorithm is a supervised machine learning method that groups all given data into in a separate group. #This grouping is based on common characteristics between different individuals. The KNN algorithm can be used for both classification and regression problems.

KNN’s

ClassicExample isClassify animal images into different groups.

Summary

This article introduces Supervised machine learning and how it can solve The two types of problems , and explain classification and regression problems, gives some examples of each output data type.

DetailsExplains what linear regression is and how it works, and provides a Python A specific example that will predict the Y value based on the independent X variable.

ThenandIntroduction#Logistic regression model, and give Shown is an example of a classification model that classifies a given image into specific flower species.

For the support vector machine algorithm, can be used It predicts a given flower species of 3 different flower species. Finallylistsother famous supervised machine learning algorithms, such as decision-making Trees, random forests, and K-nearest neighbor algorithms.

Whether you are studying or work Still reading this article for fun, we think Understanding these algorithms is the start of getting into the machine A beginning in the field of learning.

If you are interested and want to learn more about the field of machine learning, we recommend youGo deeper Study how such algorithms work and how such models can be tuned to further improve their performance.

Translator introduction

Cui Hao, 51CTO community editor and senior architect, has 18 years of software development and architecture experience and 10 years of distributed architecture experience. Formerly a technical expert at HP. He is willing to share and has written many popular technical articles with more than 600,000 reads. Author of "Principles and Practice of Distributed Architecture" ​​.

Original title: ##Primary Supervised Learning Algorithms Used in Machine Learning, Author: Kevin Vu

The above is the detailed content of What are the common algorithms for supervised learning? How are they applied?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete