Home > Article > Technology peripherals > What are the common algorithms for supervised learning? How are they applied?
Supervised learning is a subset of machine learning. Supervised learning labels the input data of the machine learning model and exercises it. Therefore, the supervised model can predict the output of the model to the maximum extent.
The concept behind supervised learning can also be found in real life, such as teachers tutoring children. Suppose the teacher wants to teach children to recognize images of cats and dogs. S/he will tutor the child by continuously showing the child an image of a cat or a dog while informing the child whether the image is a dog or a cat.
The process of displaying and informing images can be thought of as labeling data. During the training process of the machine learning model, you will be told which data belongs to which category.
What is the use of supervised learning? Supervised learning can be used for both regression and classification problems. Classification models allow algorithms to determine which group given data belongs to. Examples might include True/False, Dog/Cat, etc.
Because regression models can predict future values based on historical data, they can be used to predict employee wages or real estate sales prices.
In this article, we will list some common algorithms used for supervised learning, as well as practical tutorials on such algorithms.
Linear regression is a supervised learning algorithm that predicts an output value based on a given input value. Linear regression is used when the target (output) variable returns a continuous value.
There are two main types of linear algorithms, simple linear regression and multiple linear regression.
Simple linear regression uses only one independent (input) variable. An example is predicting a child's age given a height.
On the other hand, multiple linear regression can use multiple independent variables to predict its final outcome. An example is predicting the price of a given property based on its location, size, demand, etc.
The following is the linear regression formula
For the Python example, we will use linear regression to predict the y value relative to a given x value.
The data set we are given contains only two columns: x and y. Note that the y result will return continuous values.
The following is a screenshot of the given data set:
import numpy as np <br>import pandas as pd <br>import matplotlib.pyplot as plt <br>import seaborn as sns from sklearn <br>import linear_model from sklearn.model_selection <br>import train_test_split import os
To simplify the data set, we sampled 50 A sample of the data rows and rounds the data values to 2 significant figures.
#Please note that you should import the given dataset before completing this step.
df = pd.read_csv("../input/random-linear-regression/train.csv") <br>df=df.sample(50) df=round(df,2)
If the data set contains null values and infinite values, an error may occur. Therefore, we will use the clean_dataset function to clean the dataset of these values.
def clean_dataset(df): <br>assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame" <br>df.dropna(inplace=True) <br>indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1) <br>return df[indices_to_keep].astype(np.float64)<br>df=clean_dataset(df)
Please note that we Convert the data to DataFrame format. The dataframe data type is a two-dimensional structure that aligns our data into rows and columns.
We divide the data set into training and Test part. The test data set size was chosen to be 20% of the total data set.
Please note that by setting random_state=1, the same data split will occur every time the model is run, resulting in the exact same training and test data set.
#This is useful in situations where you want to further tune the model.
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
使用导入的线性回归模型,我们可以在模型中自由使用线性回归算法,绕过我们为给定模型获得的 x 和 y 训练变量。
lm=linear_model.LinearRegression() lm.fit(x_train,y_train)
df.plot(kind="scatter", x="x", y="y")
plt.plot(X,lm.predict(X), color="red")
蓝点表示数据点,而红线是模型绘制的最佳拟合线性回归线。线性模型算法总是会尝试绘制最佳拟合线以尽可能准确地预测结果。
与线性回归类似,逻辑回归根据输入变量预测输出值,两种算法的主要区别在于逻辑回归算法的输出是分类(离散)变量。
对于 Python的示例,会使用逻辑回归将“花”分成两个不同的类别/种类。在给定的数据集中会包括不同花的多个特征。
模型的目的是将给花识别为Iris-setosa、Iris-versicolor或 Iris-virginica 几个种类。
下面是给定数据集的截图:
import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split import warnings warnings.filterwarnings('ignore')
data = pd.read_csv('../input/iris-dataset-logistic-regression/iris.csv')
对于独立 value(x) ,将包括除类型列之外的所有可用列。至于我们的可靠值(y),将只包括类型列。
X = data[['x0','x1','x2','x3','x4']] <br>y = data[['type']]
将数据集分成两部分,80% 用于训练数据集,20% 用于测试数据集。
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.2, random_state=1)
从 linear_model 库中导入整个逻辑回归算法。然后我们可以将 X 和 y 训练数据拟合到逻辑模型中。
from sklearn.linear_model import LogisticRegression <br>model = LogisticRegression(random_state = 0) <br>model.fit(X_train, y_train)
print(lm.score(x_test, y_test))
返回值为0.9845128775509371,这表明我们模型的高性能。
请注意,随着测试分数的增加,模型的性能也会增加。
import matplotlib.pyplot as plt %matplotlib inline <br>plt.plot(range(len(X_test)), pred,'o',c='r')
输出图:
在逻辑图中,红点表示给定的数据点。这些点清楚地分为 3 类,Virginica、versicolor 和 setosa 花种。
使用这种技术,逻辑回归模型可以根据花在图表上的位置轻松对花类型进行分类。
支持向量机( SVM) 算法是另一个著名的监督机器学习模型,由 Vladimir Vapnik 创建,它能够解决分类和回归问题。实际上它更多地被用到解决分类问题。
SVM 算法能够将给定的数据点分成不同的组。算法在绘制出数据之后,可以绘制最合适的线将数据分成多个类别,从而分析数据之间的关系。
如下图所示,绘制的线将数据集完美地分成 2 个不同的组,蓝色和绿色。
SVM 模型可以根据图形的维度绘制直线或超平面。行只能用于二维数据集,这意味着只有 2 列的数据集。
如果是多个特征来预测数据集,就需要更高的维度。在数据集超过 2 维的情况下,支持向量机模型将绘制超平面。
在支持向量机 Python 的示例中,将对 3 种不同的花卉类型进行物种分类。我们的自变量包括花的所有特征,而因变量是花所属物种。
花卉品种包括Iris-setosa、 Iris-versicolor和Iris-virginica。
下面是数据集的截图:
import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split from sklearn.datasets <br>import load_iris
请注意,在执行此步骤之前,应该导入数据集。
data = pd.read_csv(‘../input/iris-flower-dataset/IRIS.csv’)
将 X 值作为自变量,其中包含除物种列之外的所有列。
因变量y仅包含模型预测的物种列。
X = data.drop(‘species’, axis=1) y = data[‘species’]
将数据集分为两部分,其中我们将 80% 的数据放入训练数据集中,将 20% 放入测试数据集中。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
导入了支持向量机算法。然后,使用上面步骤中收到的 X 和 y 训练数据集运行它。
from sklearn.svm import SVC <br>model = SVC( ) <br>model.fit(X_train, y_train)
model.score(X_test, y_test)
为了评估模型的性能,将使用 score 函数。在第四步中创建的 X 和 y 测试值输入到 score 方法中。
返回值为0.9666666666667,这表明模型的高性能。
请注意,随着测试分数的增加,模型的性能也会增加。
Although linear, logistic and SVM algorithms are very reliable, they stillwill Mention some supervised machine learning algorithms.
Decision Tree Algorithm is a supervised machine learning model that uses a tree structure to make decisions. Decision trees are often used in classification problems where a model can decide which group a given item in a data set belongs to.
#Please note that the tree format used is an inverted tree format.
is considered a more complex algorithm, The random forest algorithm achieves its ultimate goal by building a large number of decision trees.
# means building multiple decision trees simultaneously, each returning its own results, which are then combined to get a better result.
#For classification problems, the random forest model will generate multiple decision trees and classify a given object based on the classification group predicted by the majority of the trees.
The model can fix overfitting caused by a single treeProblem. Also, the random forest algorithm can also be used for regression, although it may lead to undesirable results. 3. k-recent
Neighbor(KNN) algorithm is a supervised machine learning method that groups all given data into in a separate group. #This grouping is based on common characteristics between different individuals. The KNN algorithm can be used for both classification and regression problems.
KNN’s
ClassicExample isClassify animal images into different groups. This article introduces Supervised machine learning and how it can solve The two types of problems , and explain classification and regression problems, gives some examples of each output data type. DetailsExplains what linear regression is and how it works, and provides a Python A specific example that will predict the Y value based on the independent X variable. ThenandIntroduction#Logistic regression model, and give Shown is an example of a classification model that classifies a given image into specific flower species. For the support vector machine algorithm, can be used It predicts a given flower species of 3 different flower species. Finallylistsother famous supervised machine learning algorithms, such as decision-making Trees, random forests, and K-nearest neighbor algorithms. Whether you are studying or work Still reading this article for fun, we think Understanding these algorithms is the start of getting into the machine A beginning in the field of learning. If you are interested and want to learn more about the field of machine learning, we recommend youGo deeper Study how such algorithms work and how such models can be tuned to further improve their performance. Translator introduction Cui Hao, 51CTO community editor and senior architect, has 18 years of software development and architecture experience and 10 years of distributed architecture experience. Formerly a technical expert at HP. He is willing to share and has written many popular technical articles with more than 600,000 reads. Author of "Principles and Practice of Distributed Architecture" . Original title: ##Primary Supervised Learning Algorithms Used in Machine Learning, Author: Kevin Vu Summary
The above is the detailed content of What are the common algorithms for supervised learning? How are they applied?. For more information, please follow other related articles on the PHP Chinese website!