Bootstrap is an important statistical method in non-parametric statistics that estimates the variability of statistics and can estimate the interval of statistics. It is also called the bootstrap method.
The core ideas and basic steps are as follows: (Recommended learning: Bootstrap video tutorial)
(1) Use repeated sampling technology to extract a certain number of samples (can be given by yourself, generally the same as the original sample) from the original sample. This process allows repeated sampling.
(2) Calculate the statistic T to be estimated based on the extracted samples.
(3) Repeat the above N times (generally greater than 1000) to obtain N statistics T.
(4) Calculate the sample variance of the above N statistics T to estimate the variance of the statistic T.
It should be said that Bootstrap is a popular statistical method in modern statistics and works well in small samples. Confidence intervals can be constructed through the estimation of variance, and its application scope is further extended.
Example of specific sampling method: If you want to know the number of fish in the pond, you can first extract N fish, mark them, and put them back into the pond.
Carry out repeated sampling, draw M times, and draw N fish each time. Examine the proportion of marked fish among the fish drawn each time, and calculate the statistics based on the proportion of M times.
If there are obvious layers in the data, stratified sampling can be used to improve analysis efficiency. spss defaults to the non-parametric bootstrap method and uses completely random sampling. Therefore, if stratified sampling is required, you cannot rely on the default and need to set it yourself. .
There is also special attention to be paid to how many observations are needed to be the most scientific and reasonable. The answer is 1,000. If it is less than this number, the calculation result will be inaccurate because the confidence interval is calculated according to the percentile method, so it cannot be too small. If there are more than 1000, the accuracy improvement is very limited in most cases, and system resources and computing time are wasted.
Operation steps in bootstrap's spss analysis: "Analysis" ~ "Compare Means" ~ "Means" ~ Select the independent variable and dependent variable ~ "Options" sub-dialog ~ "Cell Statistics" ~ bootstrap sub-dialog box ~ Execute bootstrap check box
One more point, if the dependent variable conforms to or approximately conforms to the normal distribution, the bootstrap method does not need to be used.
For more technical articles related to Bootstrap, please visit the Bootstrap Tutorial column to learn!
The above is the detailed content of What does bootstrap algorithm mean?. For more information, please follow other related articles on the PHP Chinese website!