Home >Technology peripherals >AI >Overview of ensemble methods in machine learning

Overview of ensemble methods in machine learning

WBOY
WBOYforward
2023-04-15 13:52:07986browse

Imagine you are shopping online and you find two stores selling the same product with the same ratings. However, the first one was rated by just one person and the second one was rated by 100 people. Which rating would you trust more? Which product will you choose to buy in the end? The answer for most people is simple. The opinions of 100 people are certainly more trustworthy than the opinions of just one. This is called the “wisdom of the crowd” and is why the ensemble approach works.

Overview of ensemble methods in machine learning

Ensemble method

Typically, we only create a learner (learner = training model) from the training data (i.e., we only create a learner from the training data to train a machine learning model). The ensemble method is to let multiple learners solve the same problem and then combine them together. These learners are called basic learners and can have any underlying algorithm, such as neural networks, support vector machines, decision trees, etc. If all these base learners are composed of the same algorithm then they are called homogeneous base learners, whereas if they are composed of different algorithms then they are called heterogeneous base learners. Compared to a single base learner, an ensemble has better generalization capabilities, resulting in better results.

When the ensemble method consists of weak learners. Therefore, basic learners are sometimes called weak learners. While ensemble models or strong learners (which are combinations of these weak learners) have lower bias/variance and achieve better performance. The ability of this integrated approach to transform weak learners into strong learners has become popular because weak learners are more readily available in practice.

In recent years, integrated methods have continuously won various online competitions. In addition to online competitions, ensemble methods are also applied in real-life applications such as computer vision technologies such as object detection, recognition, and tracking.

Main types of ensemble methods

How are weak learners generated?

According to the generation method of the base learner, integration methods can be divided into two major categories, namely sequential integration methods and parallel integration methods. As the name suggests, in the Sequential ensemble method, base learners are generated sequentially and then combined to make predictions, such as Boosting algorithms such as AdaBoost. In the Parallel ensemble method, the basic learners are generated in parallel and then combined for prediction, such as bagging algorithms such as random forest and stacking. The following figure shows a simple architecture explaining parallel and sequential approaches.

According to the different generation methods of basic learners, integration methods can be divided into two categories: sequential integration methods and parallel integration methods. As the name suggests, in the sequential ensemble method, base learners are generated in order and then combined to make predictions, such as Boosting algorithms such as AdaBoost. In parallel ensemble methods, base learners are generated in parallel and then combined together for prediction, such as bagging algorithms such as Random Forest and Stacking. The figure below shows a simple architecture explaining both parallel and sequential approaches.

Overview of ensemble methods in machine learning

Parallel and sequential integration method

The sequential learning method uses the dependency between weak learners to improve the overall performance in a residual decreasing manner, so that Late learners pay more attention to the mistakes of former learners. Roughly speaking (for regression problems), the reduction in ensemble model error obtained by boosting methods is mainly achieved by reducing the high bias of weak learners, although a reduction in variance is sometimes observed. On the other hand, the parallel ensemble method reduces the error by combining independent weak learners, that is, it exploits the independence between weak learners. This reduction in error is due to a reduction in the variance of the machine learning model. Therefore, we can summarize that boosting mainly reduces errors by reducing the bias of the machine learning model, while bagging reduces errors by reducing the variance of the machine learning model. This is important because which ensemble method is chosen will depend on whether the weak learners have high variance or high bias.

How to combine weak learners?

After generating these so-called base learners, we do not select the best of these learners, but combine them together for better generalization, and the way we do this is in ensemble plays an important role in the method.

Averaging: When the output is a number, the most common way to combine base learners is averaging. The average can be a simple average or a weighted average. For regression problems, the simple average will be the sum of the errors of all base models divided by the total number of learners. The weighted average combined output is achieved by giving different weights to each base learner. For regression problems, we multiply the error of each base learner by the given weight and then sum it.

Voting: For nominal output, voting is the most common way to combine base learners. Voting can be of different types such as majority voting, majority voting, weighted voting and soft voting. For classification problems, a supermajority vote gives each learner one vote, and they vote for a class label. Whichever class label gets more than 50% of the votes is the predicted result of the ensemble. However, if no class label gets more than 50% of the votes, a reject option is given, which means that the combined ensemble cannot make any predictions. In relative majority voting, the class label with the most votes is the prediction result, and more than 50% of the votes are not necessary for the class label. Meaning, if we have three output labels, and all three get results less than 50%, such as 40% 30% 30%, then getting 40% of the class labels is the prediction result of the ensemble model. . Weighted voting, like weighted averaging, assigns weights to classifiers based on their importance and the strength of a particular learner. Soft voting is used for class outputs with probabilities (values ​​between 0 and 1) rather than labels (binary or other). Soft voting is further divided into simple soft voting (a simple average of probabilities) and weighted soft voting (weights are assigned to learners, and the probabilities are multiplied by these weights and added).

Learning: Another combination method is combination through learning, which is used by the stacking ensemble method. In this approach, a separate learner called a meta-learner is trained on a new dataset to combine other base/weak learners generated from the original machine learning dataset.

Please note that whether boosting, bagging or stacking, all three ensemble methods can be generated using homogeneous or heterogeneous weak learners. The most common approach is to use homogeneous weak learners for Bagging and Boosting, and heterogeneous weak learners for Stacking. The figure below provides a good classification of the three main ensemble methods.

Overview of ensemble methods in machine learning

Classify the main types of ensemble methods

Ensemble diversity

Ensemble diversity refers to the differences between the underlying learners How big it is, which has important implications for generating good ensemble models. It has been theoretically proven that, through different combination methods, completely independent (diverse) base learners can minimize errors, while completely (highly) related learners do not bring any improvements. This is a challenging problem in real life, as we are training all weak learners to solve the same problem by using the same dataset, resulting in high correlation. On top of this, we need to make sure that weak learners are not really bad models, as this may even cause the ensemble performance to deteriorate. On the other hand, combining strong and accurate basic learners may not be as effective as combining some weak learners with some strong learners. Therefore, a balance needs to be struck between the accuracy of the base learner and the differences between the base learners.

How to achieve integration diversity?

1. Data processing

We can divide our data set into subsets for basic learners. If the machine learning dataset is large, we can simply split the dataset into equal parts and feed them into the machine learning model. If the data set is small, we can use random sampling with replacement to generate a new data set from the original data set. Bagging method uses bootstrapping technique to generate new data sets, which is basically random sampling with replacement. With bootstrapping we are able to create some randomness since all generated datasets must have some different values. However, note that most values ​​(about 67% according to theory) will still be repeated, so the data sets will not be completely independent.

2. Input features

All datasets contain features that provide information about the data. Instead of using all features in one model, we can create subsets of features and generate different datasets and feed them into the model. This method is adopted by the random forest technique and is effective when there are a large number of redundant features in the data. Effectiveness decreases when there are few features in the dataset.

3. Learning parameters

This technology generates randomness in the basic learner by applying different parameter settings to the basic learning algorithm, that is, hyperparameter tuning. For example, by changing the regularization terms, different initial weights can be assigned to individual neural networks.

Integration Pruning

Finally, integration pruning technology can help achieve better integration performance in some cases. Ensemble Pruning means that we only combine a subset of learners instead of combining all weak learners. In addition to this, smaller integrations can save storage and computing resources, thereby improving efficiency.

Finally

This article is just an overview of machine learning ensemble methods. I hope everyone can conduct more in-depth research, and more importantly, be able to apply the research to real life.


The above is the detailed content of Overview of ensemble methods in machine learning. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete