Home >Technology peripherals >AI >Super strong! The top ten machine learning algorithms you must know
Linear regression is the simplest and most widely used method for predictive modeling One of the machine learning algorithms.
It is a supervised learning algorithm used to predict the value of a dependent variable based on one or more independent variables.
The core of linear regression is to fit a linear model based on observed data.
The linear model is represented by the following equation:
The linear regression algorithm involves finding the best path through the data points Fitting line. This is usually done by minimizing the squared difference between the observed and predicted values.
from sklearn.datasets import load_diabetesfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_score# Load the Diabetes datasetdiabetes = load_diabetes()X, y = diabetes.data, diabetes.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Linear Regression modelmodel = LinearRegression()model.fit(X_train, y_train)# Predicting the test set resultsy_pred = model.predict(X_test)# Evaluating the modelmse = mean_squared_error(y_test, y_pred)r2 = r2_score(y_test, y_pred)print("MSE is:", mse)print("R2 score is:", r2)
Logistic regression is used for classification problems. It predicts the probability that a given data point belongs to a certain category, such as yes/no or 0/1.
from sklearn.datasets import load_breast_cancerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score# Load the Breast Cancer datasetbreast_cancer = load_breast_cancer()X, y = breast_cancer.data, breast_cancer.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Logistic Regression modelmodel = LogisticRegression(max_iter=10000)model.fit(X_train, y_train)# Predicting the test set resultsy_pred = model.predict(X_test)# Evaluating the modelaccuracy = accuracy_score(y_test, y_pred)precision = precision_score(y_test, y_pred)recall = recall_score(y_test, y_pred)f1 = f1_score(y_test, y_pred)# Print the resultsprint("Accuracy:", accuracy)print("Precision:", precision)print("Recall:", recall)print("F1 Score:", f1)
Decision tree is a versatile and powerful machine learning algorithm. Can be used for classification and regression tasks.
They are popular for their simplicity, interpretability, and ability to handle both numerical and categorical data.
A decision tree consists of nodes representing decision points, branches representing possible outcomes, and leaves representing the final decision or prediction.
Each node in the decision tree corresponds to a feature, and the branches represent the possible values of the feature.
The algorithm for building a decision tree involves recursively splitting a data set into subsets based on the values of different features. The goal is to create homogeneous subsets where the target variable (the variable we want to predict) is similar in each subset.
The splitting process continues until stopping criteria are met, such as maximum depth, minimum number of samples, or no further improvements can be made.
from sklearn.datasets import load_winefrom sklearn.tree import DecisionTreeClassifier# Load the Wine datasetwine = load_wine()X, y = wine.data, wine.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Decision Tree modelmodel = DecisionTreeClassifier(random_state=42)model.fit(X_train, y_train)# Predicting the test set resultsy_pred = model.predict(X_test)# Evaluating the modelaccuracy = accuracy_score(y_test, y_pred)precision = precision_score(y_test, y_pred, average='macro')recall = recall_score(y_test, y_pred, average='macro')f1 = f1_score(y_test, y_pred, average='macro')# Print the resultsprint("Accuracy:", accuracy)print("Precision:", precision)print("Recall:", recall)print("F1 Score:", f1)
Naive Bayes classifiers are a family of simple "probabilistic classifiers" that use Bayes' theorem and the assumption of strong (naive) independence between features. It is especially used for text classification.
It calculates the probability of each class and the conditional probability of each class given each input value. These probabilities are then used to classify new values based on the highest probability.
from sklearn.datasets import load_digitsfrom sklearn.naive_bayes import GaussianNB# Load the Digits datasetdigits = load_digits()X, y = digits.data, digits.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Naive Bayes modelmodel = GaussianNB()model.fit(X_train, y_train)# Predicting the test set resultsy_pred = model.predict(X_test)# Evaluating the modelaccuracy = accuracy_score(y_test, y_pred)precision = precision_score(y_test, y_pred, average='macro')recall = recall_score(y_test, y_pred, average='macro')f1 = f1_score(y_test, y_pred, average='macro')# Print the resultsprint("Accuracy:", accuracy)print("Precision:", precision)print("Recall:", recall)print("F1 Score:", f1)
K 最近邻 (KNN) 是一种简单直观的机器学习算法,用于分类和回归任务。
它根据输入数据点与其在特征空间中最近邻居的相似性进行预测。
在 KNN 中,新数据点的预测由其 k 个最近邻的多数类(用于分类)或平均值(用于回归)确定。KNN 中的 “k” 表示要考虑的邻居数量,这是用户选择的超参数。
KNN 算法包括以下步骤
from sklearn.datasets import load_winefrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score# Load the Wine datasetwine = load_wine()X, y = wine.data, wine.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the KNN modelknn_model = KNeighborsClassifier(n_neighbors=3)knn_model.fit(X_train, y_train)# Predicting the test set resultsy_pred_knn = knn_model.predict(X_test)# Evaluating the modelaccuracy_knn = accuracy_score(y_test, y_pred_knn)precision_knn = precision_score(y_test, y_pred_knn, average='macro')recall_knn = recall_score(y_test, y_pred_knn, average='macro')f1_knn = f1_score(y_test, y_pred_knn, average='macro')# Print the resultsprint("Accuracy:", accuracy_knn)print("Precision:", precision_knn)print("Recall:", recall_knn)print("F1 Score:", f1_knn)
支持向量机 (SVM) 是一种强大的监督学习算法,用于分类和回归任务。
它们在高维空间中特别有效,广泛应用于图像分类、文本分类和生物信息学等各个领域。
支持向量机的工作原理是找到最能将数据分为不同类别的超平面。
选择超平面以最大化边距,即超平面与每个类的最近数据点(支持向量)之间的距离。
SVM 还可以通过使用核函数将输入空间转换为可以线性分离的高维空间来处理非线性数据。
训练 SVM 的算法包括以下步骤:
from sklearn.svm import SVCbreast_cancer = load_breast_cancer()X, y = breast_cancer.data, breast_cancer.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the SVM modelsvm_model = SVC()svm_model.fit(X_train, y_train)# Predicting the test set resultsy_pred_svm = svm_model.predict(X_test)# Evaluating the modelaccuracy_svm = accuracy_score(y_test, y_pred_svm)precision_svm = precision_score(y_test, y_pred_svm, average='macro')recall_svm = recall_score(y_test, y_pred_svm, average='macro')f1_svm = f1_score(y_test, y_pred_svm, average='macro')accuracy_svm, precision_svm, recall_svm, f1_svm# Print the resultsprint("Accuracy:", accuracy_svm)print("Precision:", precision_svm)print("Recall:", recall_svm)print("F1 Score:", f1_svm)
随机森林是一种集成学习技术,它结合了多个决策树来提高预测性能并减少过度拟合。
它们广泛用于分类和回归任务,并以其鲁棒性和多功能性而闻名。
随机森林是根据数据集的随机子集并使用特征的随机子集进行训练的决策树的集合。
森林中的每棵决策树独立地进行预测,最终的预测是通过聚合所有树的预测来确定的。
构建随机森林的算法包括以下步骤
from sklearn.ensemble import RandomForestClassifierbreast_cancer = load_breast_cancer()X, y = breast_cancer.data, breast_cancer.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Random Forest modelrf_model = RandomForestClassifier(random_state=42)rf_model.fit(X_train, y_train)# Predicting the test set resultsy_pred_rf = rf_model.predict(X_test)# Evaluating the modelaccuracy_rf = accuracy_score(y_test, y_pred_rf)precision_rf = precision_score(y_test, y_pred_rf, average='macro')recall_rf = recall_score(y_test, y_pred_rf, average='macro')f1_rf = f1_score(y_test, y_pred_rf, average='macro')# Print the resultsprint("Accuracy:", accuracy)print("Precision:", precision)print("Recall:", recall)print("F1 Score:", f1)
K 均值聚类是一种无监督学习算法,用于将数据分组为 “K” 个聚类。确定 k 个质心后,每个数据点被分配到最近的簇。
该算法将数据点分配给一个簇,使得数据点与簇质心之间的平方距离之和最小。
from sklearn.datasets import load_irisfrom sklearn.cluster import KMeansfrom sklearn.metrics import silhouette_score# Load the Iris datasetiris = load_iris()X = iris.data# Applying K-Means Clusteringkmeans = KMeans(n_clusters=3, random_state=42)kmeans.fit(X)# Predicting the cluster for each data pointy_pred_clusters = kmeans.predict(X)# Evaluating the modelinertia = kmeans.inertia_silhouette = silhouette_score(X, y_pred_clusters)print("Inertia:", inertia)print("Silhouette:", silhouette)
降维是通过使用主成分分析 (PCA) 来完成的。它将数据转换为新的坐标系,减少变量数量,同时尽可能多地保留原始数据的变化。
使用 PCA 可以找到使数据方差最大化的主要成分或轴。第一个主成分捕获最大方差,第二个主成分(与第一个主成分正交)捕获第二大方差,依此类推。
from sklearn.datasets import load_breast_cancerfrom sklearn.decomposition import PCAimport numpy as np# Load the Breast Cancer datasetbreast_cancer = load_breast_cancer()X = breast_cancer.data# Applying PCApca = PCA(n_compnotallow=2)# Reducing to 2 dimensions for simplicitypca.fit(X)# Transforming the dataX_pca = pca.transform(X)# Explained Varianceexplained_variance = pca.explained_variance_ratio_# Total Explained Variancetotal_explained_variance = np.sum(explained_variance)print("Explained variance:", explained_variance)print("Total Explained Variance:", total_explained_variance)
梯度提升是一种先进的机器学习技术。它依次构建多个弱预测模型(通常是决策树)。每个新模型都逐渐最小化整个模型的损失函数(误差)。
from sklearn.datasets import load_diabetesfrom sklearn.ensemble import GradientBoostingRegressorfrom sklearn.metrics import mean_squared_error, r2_score# Load the Diabetes datasetdiabetes = load_diabetes()X, y = diabetes.data, diabetes.target# Splitting the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Creating and training the Gradient Boosting modelgb_model = GradientBoostingRegressor(random_state=42)gb_model.fit(X_train, y_train)# Predicting the test set resultsy_pred_gb = gb_model.predict(X_test)# Evaluating the modelmse_gb = mean_squared_error(y_test, y_pred_gb)r2_gb = r2_score(y_test, y_pred_gb)print("MSE:", mse_gb)
The above is the detailed content of Super strong! The top ten machine learning algorithms you must know. For more information, please follow other related articles on the PHP Chinese website!