Overview of ensemble methods in machine learning
Imagine you are shopping online and you find two stores selling the same product with the same ratings. However, the first one was rated by just one person and the second one was rated by 100 people. Which rating would you trust more? Which product will you choose to buy in the end? The answer for most people is simple. The opinions of 100 people are certainly more trustworthy than the opinions of just one. This is called the “wisdom of the crowd” and is why the ensemble approach works.
Ensemble method
Typically, we only create a learner (learner = training model) from the training data (i.e., we only create a learner from the training data to train a machine learning model). The ensemble method is to let multiple learners solve the same problem and then combine them together. These learners are called basic learners and can have any underlying algorithm, such as neural networks, support vector machines, decision trees, etc. If all these base learners are composed of the same algorithm then they are called homogeneous base learners, whereas if they are composed of different algorithms then they are called heterogeneous base learners. Compared to a single base learner, an ensemble has better generalization capabilities, resulting in better results.
When the ensemble method consists of weak learners. Therefore, basic learners are sometimes called weak learners. While ensemble models or strong learners (which are combinations of these weak learners) have lower bias/variance and achieve better performance. The ability of this integrated approach to transform weak learners into strong learners has become popular because weak learners are more readily available in practice.
In recent years, integrated methods have continuously won various online competitions. In addition to online competitions, ensemble methods are also applied in real-life applications such as computer vision technologies such as object detection, recognition, and tracking.
Main types of ensemble methods
How are weak learners generated?
According to the generation method of the base learner, integration methods can be divided into two major categories, namely sequential integration methods and parallel integration methods. As the name suggests, in the Sequential ensemble method, base learners are generated sequentially and then combined to make predictions, such as Boosting algorithms such as AdaBoost. In the Parallel ensemble method, the basic learners are generated in parallel and then combined for prediction, such as bagging algorithms such as random forest and stacking. The following figure shows a simple architecture explaining parallel and sequential approaches.
According to the different generation methods of basic learners, integration methods can be divided into two categories: sequential integration methods and parallel integration methods. As the name suggests, in the sequential ensemble method, base learners are generated in order and then combined to make predictions, such as Boosting algorithms such as AdaBoost. In parallel ensemble methods, base learners are generated in parallel and then combined together for prediction, such as bagging algorithms such as Random Forest and Stacking. The figure below shows a simple architecture explaining both parallel and sequential approaches.
Parallel and sequential integration method
The sequential learning method uses the dependency between weak learners to improve the overall performance in a residual decreasing manner, so that Late learners pay more attention to the mistakes of former learners. Roughly speaking (for regression problems), the reduction in ensemble model error obtained by boosting methods is mainly achieved by reducing the high bias of weak learners, although a reduction in variance is sometimes observed. On the other hand, the parallel ensemble method reduces the error by combining independent weak learners, that is, it exploits the independence between weak learners. This reduction in error is due to a reduction in the variance of the machine learning model. Therefore, we can summarize that boosting mainly reduces errors by reducing the bias of the machine learning model, while bagging reduces errors by reducing the variance of the machine learning model. This is important because which ensemble method is chosen will depend on whether the weak learners have high variance or high bias.
How to combine weak learners?
After generating these so-called base learners, we do not select the best of these learners, but combine them together for better generalization, and the way we do this is in ensemble plays an important role in the method.
Averaging: When the output is a number, the most common way to combine base learners is averaging. The average can be a simple average or a weighted average. For regression problems, the simple average will be the sum of the errors of all base models divided by the total number of learners. The weighted average combined output is achieved by giving different weights to each base learner. For regression problems, we multiply the error of each base learner by the given weight and then sum it.
Voting: For nominal output, voting is the most common way to combine base learners. Voting can be of different types such as majority voting, majority voting, weighted voting and soft voting. For classification problems, a supermajority vote gives each learner one vote, and they vote for a class label. Whichever class label gets more than 50% of the votes is the predicted result of the ensemble. However, if no class label gets more than 50% of the votes, a reject option is given, which means that the combined ensemble cannot make any predictions. In relative majority voting, the class label with the most votes is the prediction result, and more than 50% of the votes are not necessary for the class label. Meaning, if we have three output labels, and all three get results less than 50%, such as 40% 30% 30%, then getting 40% of the class labels is the prediction result of the ensemble model. . Weighted voting, like weighted averaging, assigns weights to classifiers based on their importance and the strength of a particular learner. Soft voting is used for class outputs with probabilities (values between 0 and 1) rather than labels (binary or other). Soft voting is further divided into simple soft voting (a simple average of probabilities) and weighted soft voting (weights are assigned to learners, and the probabilities are multiplied by these weights and added).
Learning: Another combination method is combination through learning, which is used by the stacking ensemble method. In this approach, a separate learner called a meta-learner is trained on a new dataset to combine other base/weak learners generated from the original machine learning dataset.
Please note that whether boosting, bagging or stacking, all three ensemble methods can be generated using homogeneous or heterogeneous weak learners. The most common approach is to use homogeneous weak learners for Bagging and Boosting, and heterogeneous weak learners for Stacking. The figure below provides a good classification of the three main ensemble methods.
Classify the main types of ensemble methods
Ensemble diversity
Ensemble diversity refers to the differences between the underlying learners How big it is, which has important implications for generating good ensemble models. It has been theoretically proven that, through different combination methods, completely independent (diverse) base learners can minimize errors, while completely (highly) related learners do not bring any improvements. This is a challenging problem in real life, as we are training all weak learners to solve the same problem by using the same dataset, resulting in high correlation. On top of this, we need to make sure that weak learners are not really bad models, as this may even cause the ensemble performance to deteriorate. On the other hand, combining strong and accurate basic learners may not be as effective as combining some weak learners with some strong learners. Therefore, a balance needs to be struck between the accuracy of the base learner and the differences between the base learners.
How to achieve integration diversity?
1. Data processing
We can divide our data set into subsets for basic learners. If the machine learning dataset is large, we can simply split the dataset into equal parts and feed them into the machine learning model. If the data set is small, we can use random sampling with replacement to generate a new data set from the original data set. Bagging method uses bootstrapping technique to generate new data sets, which is basically random sampling with replacement. With bootstrapping we are able to create some randomness since all generated datasets must have some different values. However, note that most values (about 67% according to theory) will still be repeated, so the data sets will not be completely independent.
2. Input features
All datasets contain features that provide information about the data. Instead of using all features in one model, we can create subsets of features and generate different datasets and feed them into the model. This method is adopted by the random forest technique and is effective when there are a large number of redundant features in the data. Effectiveness decreases when there are few features in the dataset.
3. Learning parameters
This technology generates randomness in the basic learner by applying different parameter settings to the basic learning algorithm, that is, hyperparameter tuning. For example, by changing the regularization terms, different initial weights can be assigned to individual neural networks.
Integration Pruning
Finally, integration pruning technology can help achieve better integration performance in some cases. Ensemble Pruning means that we only combine a subset of learners instead of combining all weak learners. In addition to this, smaller integrations can save storage and computing resources, thereby improving efficiency.
Finally
This article is just an overview of machine learning ensemble methods. I hope everyone can conduct more in-depth research, and more importantly, be able to apply the research to real life.
The above is the detailed content of Overview of ensemble methods in machine learning. For more information, please follow other related articles on the PHP Chinese website!

Harnessing the Power of Data Visualization with Microsoft Power BI Charts In today's data-driven world, effectively communicating complex information to non-technical audiences is crucial. Data visualization bridges this gap, transforming raw data i

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver Mac version
Visual web development tools

Notepad++7.3.1
Easy-to-use and free code editor