Editing Boosting (machine learning) (section)

{{Short description|Method in machine learning}}
{{Technical|date=September 2023}}<!-- A lot of technical jargon that less experienced ML users may not understand-->
{{Machine learning|Supervised learning}}
In [[machine learning]] (ML), '''boosting''' is an [[Ensemble learning|ensemble]] [[metaheuristic]] for primarily reducing [[Bias–variance tradeoff|bias (as opposed to variance)]].<ref>{{cite web|url=http://oz.berkeley.edu/~breiman/arcall96.pdf|archive-url=https://web.archive.org/web/20150119081741/http://oz.berkeley.edu/~breiman/arcall96.pdf|url-status=dead|archive-date=2015-01-19|title=BIAS, VARIANCE, AND ARCING CLASSIFIERS|last1=Leo Breiman|author-link=Leo Breiman|date=1996|publisher=TECHNICAL REPORT|quote=Arcing [Boosting] is more successful than bagging in variance reduction|access-date=19 January 2015}}</ref> It can also improve the [[Stability (learning theory)|stability]] and accuracy of ML [[Statistical classification|classification]] and [[Regression analysis|regression]] algorithms. Hence, it is prevalent in [[supervised learning]] for converting weak learners to strong learners.<ref>{{cite book |last=Zhou Zhi-Hua |author-link=Zhou Zhihua |date=2012 |title=Ensemble Methods: Foundations and Algorithms |publisher= Chapman and Hall/CRC |page=23 |isbn=978-1439830031 |quote=The term boosting refers to a family of algorithms that are able to convert weak learners to strong learners }}</ref>

The concept of boosting is based on the question posed by [[Michael Kearns (computer scientist)|Kearns]] and [[Leslie Valiant|Valiant]] (1988, 1989)<!--Please do not cite only one, because "Kearns and Valiant" is used as a convention to denote this question.-->:<ref name="Kearns88">Michael Kearns(1988); [http://www.cis.upenn.edu/~mkearns/papers/boostnote.pdf ''Thoughts on Hypothesis Boosting''], Unpublished manuscript (Machine Learning class project, December 1988)</ref><ref>{{cite book |last1=Michael Kearns |author-link=Michael Kearns (computer scientist) |last2=Leslie Valiant |title=Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC '89 |chapter=Cryptographic limitations on learning Boolean formulae and finite automata |author2-link=Leslie Valiant |date=1989 |publisher=ACM |volume=21 |pages=433–444 |doi=10.1145/73007.73049 |isbn= 978-0897913072|s2cid=536357 }}</ref> "Can a set of weak learners create a single strong learner?" A weak learner is defined as a [[Statistical classification|classifier]] that is only slightly correlated with the true classification. A strong learner is a classifier that is arbitrarily well-correlated with the true classification. [[Robert Schapire]] answered the question in the affirmative in a paper published in 1990.<ref name="Schapire90">{{cite journal |last=Schapire |first=Robert E. |year=1990 |title=The Strength of Weak Learnability |url=http://www.cs.princeton.edu/~schapire/papers/strengthofweak.pdf |url-status=dead |journal=Machine Learning |volume=5 |issue=2 |pages=197–227 |citeseerx=10.1.1.20.723 |doi=10.1007/bf00116037 |s2cid=53304535 |archive-url=https://web.archive.org/web/20121010030839/http://www.cs.princeton.edu/~schapire/papers/strengthofweak.pdf |archive-date=2012-10-10 |access-date=2012-08-23}}</ref><!--Please do not cite only one, because "Kearns and Valiant" is used as a convention to denote this question.--> This has had significant ramifications in machine learning and [[statistics]], most notably leading to the development of boosting.<ref>{{cite journal |last = Leo Breiman |author-link = Leo Breiman |date = 1998|title = Arcing classifier (with discussion and a rejoinder by the author)|journal = Ann. Stat.|volume = 26|issue = 3|pages = 801–849|doi = 10.1214/aos/1024691079|quote = Schapire (1990) proved that boosting is possible. (Page 823)|doi-access = free}}</ref><!--{{citation needed|date=July 2014}} Could use secondary source to back up this claim. -->

Initially, the ''hypothesis boosting problem'' simply referred to the process of turning a weak learner into a strong learner.<ref name="Kearns88" /> Algorithms that achieve this quickly became known as "boosting". [[Yoav Freund|Freund]] and Schapire's arcing (Adapt[at]ive Resampling and Combining),<ref>Yoav Freund and Robert E. Schapire (1997); [https://www.cis.upenn.edu/~mkearns/teaching/COLT/adaboost.pdf ''A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting''], Journal of Computer and System Sciences, 55(1):119-139</ref> as a general technique, is more or less synonymous with boosting.<ref>Leo Breiman (1998); [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1024691079 ''Arcing Classifier (with Discussion and a Rejoinder by the Author)''], Annals of Statistics, vol. 26, no. 3, pp. 801-849: "The concept of weak learning was introduced by Kearns and Valiant (1988<!-- Michael Kearns, Leslie G. Valiant (1988); ''Learning Boolean Formulae or Finite Automata is as Hard as Factoring'', Technical Report TR-14-88, Harvard University Aiken Computation Laboratory, August 1988 -->, 1989<!-- Michael Kearns, Leslie G. Valiant (1989) ''Cryptographic Limitations on Learning Boolean Formulae and Finite Automata'', Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (pp. 433-444). New York, NY: ACM Press, later republished in the Journal of the Association for Computing Machinery, 41(1):67–95, January 1994 -->), who left open the question of whether weak and strong learnability are equivalent. The question was termed the ''boosting problem'' since a solution 'boosts' the low accuracy of a weak learner to the high accuracy of a strong learner. Schapire (1990) proved that boosting is possible. A ''boosting algorithm'' is a method that takes a weak learner and converts it into a strong one. Freund and Schapire (1997) proved that an algorithm similar to arc-fs is boosting.</ref>