Bootstrapping algorithm refers to the use of limited sample data through multiple repeated samplings to re-establish a new sample that is sufficient to represent the parent sample distribution.
#The application of bootstrapping is based on many statistical assumptions, so the accuracy of sampling will affect whether the assumptions are established or not. (Recommended learning: Bootstrap video tutorial)
In statistics, bootstrapping can refer to all experiments that rely on reset random sampling. Bootstrapping can be used to calculate the accuracy of sample estimates. For a sampling, we can only calculate one value of a certain statistic (such as the mean), and cannot know the distribution of the mean statistic. But through the bootstrap method (bootstrap method) we can simulate the approximate distribution of the mean statistic. With distribution, many things can be done (for example, you can use the results you derived to infer the actual overall situation).
The implementation of the bootstrapping method is very simple. Assume that the sample size is n:
There is sampling with replacement in the original sample, and it is drawn n times. Each time a new sample is drawn, the operation is repeated to form many new samples, through which a distribution of the sample can be calculated. The number of new samples is usually 1000-10000. If the computational cost is small or the accuracy requirements are relatively high, increase the number of new samples.
Advantages: Simple and easy to operate.
Disadvantages: The application of bootstrapping is based on many statistical assumptions, so whether the assumptions are established or not will affect the accuracy of sampling.
In machine learning, the Bootstrap method refers to random sampling with replacement, which is a resampling that allows the model or algorithm to better understand the bias, variance, and features that exist in it. The sampling of the data allows resampling to include different biases and then encompassing it as a whole. As shown in Figure 1, each sample population has different parts and is different from each other. This then affects the overall mean, standard deviation, and other descriptive measures of the data set. In turn, it can develop more robust models.
Bootstrapping is also suitable for small data sets that tend to overfit.
The reason for using Bootstrap is that it can test the stability of the solution. Testing multiple models using multiple sample data sets can improve robustness. Perhaps one sample data set has a larger mean than other data sets, or a different standard deviation. This approach can identify models that are overfitted and not tested using datasets with different variances.
Using
One of the reasons why Bootstrapping is becoming more and more common is the improvement of computing power. There are more rearrangements and resampling than before. Both Bagging and Boosting use Bootstrapping
For more Bootstrap related technical articles, please visit the Bootstrap Tutorial column to learn!
The above is the detailed content of what is bootstrapping. For more information, please follow other related articles on the PHP Chinese website!