Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bootstrap aggregating
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Description of the technique== Given a standard [[Training, validation, and test data sets|training set]] <math>D</math> of size <math>n</math>, bagging generates <math>m</math> new training sets <math>D_i</math>, each of size <math>n'</math>, by [[Sampling (statistics)|sampling]] from <math>D</math> [[Probability distribution#With finite support|uniformly]] and [[Sampling (statistics)#Replacement of selected units|with replacement]]. By sampling with replacement, some observations may be repeated in each <math>D_i</math>. If <math>n'=n</math>, then for large <math>n</math> the set <math>D_i</math> is expected to have the fraction (1 - 1/''[[e (mathematical constant)|e]]'') (~63.2%) of the unique samples of <math>D</math>, the rest being duplicates.<ref>Aslam, Javed A.; Popa, Raluca A.; and Rivest, Ronald L. (2007); [http://people.csail.mit.edu/rivest/pubs/APR07.pdf ''On Estimating the Size and Confidence of a Statistical Audit''], Proceedings of the Electronic Voting Technology Workshop (EVT '07), Boston, MA, August 6, 2007. More generally, when drawing with replacement <math>n'</math> values out of a set of <math>n</math> (different and equally likely), the expected number of unique draws is <math>n(1 - e^{-n'/n})</math>.</ref> This kind of sample is known as a [[Bootstrap (statistics)|bootstrap]] sample. Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on previous chosen samples when sampling. Then, <math>m</math> models are fitted using the above bootstrap samples and combined by averaging the output (for regression) or voting (for classification). [[File:Ensemble Bagging.svg|thumb|center|upright=2.0|An illustration for the concept of bootstrap aggregating]] Bagging leads to "improvements for unstable procedures",<ref name=":0">{{cite journal|last=Breiman|first=Leo|author-link=Leo Breiman|year=1996|title=Bagging predictors|journal=[[Machine Learning (journal)|Machine Learning]]|volume=24|issue=2|pages=123β140|citeseerx=10.1.1.32.9399|doi=10.1007/BF00058655|s2cid=47328136}}</ref> which include, for example, [[artificial neural networks]], [[classification and regression tree]]s, and subset selection in [[linear regression]].<ref name=":1" /> Bagging was shown to improve preimage learning.<ref>Sahu, A., Runger, G., Apley, D., [https://www.researchgate.net/profile/Anshuman_Sahu/publication/254023773_Image_denoising_with_a_multi-phase_kernel_principal_component_approach_and_an_ensemble_version/links/5427b5e40cf2e4ce940a4410/Image-denoising-with-a-multi-phase-kernel-principal-component-approach-and-an-ensemble-version.pdf Image denoising with a multi-phase kernel principal component approach and an ensemble version], IEEE Applied Imagery Pattern Recognition Workshop, pp.1-7, 2011.</ref><ref>Shinde, Amit, Anshuman Sahu, Daniel Apley, and George Runger. "[https://www.researchgate.net/profile/Anshuman_Sahu/publication/263388433_Preimages_for_variation_patterns_from_kernel_PCA_and_bagging/links/5427b3930cf26120b7b35ebd/Preimages-for-variation-patterns-from-kernel-PCA-and-bagging.pdf Preimages for Variation Patterns from Kernel PCA and Bagging]." IIE Transactions, Vol.46, Iss.5, 2014</ref> On the other hand, it can mildly degrade the performance of stable methods such as [[k-nearest neighbors algorithm|''k''-nearest neighbors]].<ref name=":0" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)