Editing Random forest (section)

{{About|the machine learning technique|other kinds of random tree|Random tree}}
{{Short description|Tree-based ensemble machine learning method}}
{{Machine learning|Supervised learning}}

'''Random forests''' or '''random decision forests''' is an [[ensemble learning]] method for [[statistical classification|classification]], [[regression analysis|regression]] and other tasks that works by creating a multitude of [[decision tree learning|decision trees]] during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees.<ref name="ho1995"/><ref name="ho1998"/> Random forests correct for decision trees' habit of [[overfitting]] to their [[Test set|training set]].{{r|elemstatlearn}}{{rp|pp=587–588}}

The first algorithm for random decision forests was created in 1995 by [[Tin Kam Ho]]<ref name="ho1995">{{cite conference
 |first        = Tin Kam
 |last         = Ho
 |title        = Random Decision Forests
 |conference   = Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995
 |year         = 1995
 |pages        = 278–282
 |url          = http://ect.bell-labs.com/who/tkh/publications/papers/odt.pdf
 |access-date  = 5 June 2016
 |archive-url  = https://web.archive.org/web/20160417030218/http://ect.bell-labs.com/who/tkh/publications/papers/odt.pdf
 |archive-date = 17 April 2016

 |df           = dmy-all
}}</ref> using the [[random subspace method]],<ref name="ho1998">{{cite journal | first = Tin Kam | last = Ho | name-list-style = vanc | title = The Random Subspace Method for Constructing Decision Forests | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | year = 1998 | volume = 20 | issue = 8 | pages = 832–844 | doi = 10.1109/34.709601 | s2cid = 206420153 | url = http://ect.bell-labs.com/who/tkh/publications/papers/df.pdf }}</ref> which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.<ref name="kleinberg1990">{{cite journal |first=Eugene |last=Kleinberg | name-list-style = vanc |title=Stochastic Discrimination |journal=[[Annals of Mathematics and Artificial Intelligence]] |year=1990 |volume=1 |issue=1–4 |pages=207–239 |url=https://pdfs.semanticscholar.org/faa4/c502a824a9d64bf3dc26eb90a2c32367921f.pdf |archive-url=https://web.archive.org/web/20180118124007/https://pdfs.semanticscholar.org/faa4/c502a824a9d64bf3dc26eb90a2c32367921f.pdf  |archive-date=2018-01-18 |doi=10.1007/BF01531079|citeseerx=10.1.1.25.6750 |s2cid=206795835 }}</ref><ref name="kleinberg1996">{{cite journal |first=Eugene |last=Kleinberg | name-list-style = vanc |title=An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition |journal=[[Annals of Statistics]] |year=1996 |volume=24 |issue=6 |pages=2319–2349 |doi=10.1214/aos/1032181157 |mr=1425956|doi-access=free }}</ref><ref name="kleinberg2000">{{cite journal|first=Eugene|last=Kleinberg| name-list-style = vanc |title=On the Algorithmic Implementation of Stochastic Discrimination|journal= IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2000|volume=22|issue=5|pages=473–490|url=https://pdfs.semanticscholar.org/8956/845b0701ec57094c7a8b4ab1f41386899aea.pdf|archive-url=https://web.archive.org/web/20180118124006/https://pdfs.semanticscholar.org/8956/845b0701ec57094c7a8b4ab1f41386899aea.pdf|archive-date=2018-01-18|doi=10.1109/34.857004|citeseerx=10.1.1.33.4131|s2cid=3563126}}</ref>

An extension of the algorithm was developed by [[Leo Breiman]]<ref name="breiman2001">{{cite journal | first = Leo | last = Breiman | author-link = Leo Breiman | name-list-style = vanc | title = Random Forests | journal = [[Machine Learning (journal)|Machine Learning]] | year = 2001 | volume = 45 | issue = 1 | pages = 5–32 | doi = 10.1023/A:1010933404324 | bibcode = 2001MachL..45....5B | doi-access = free }}</ref> and [[Adele Cutler]],<ref name="rpackage"/> who registered<ref>U.S. trademark registration number 3185828, registered 2006/12/19.</ref> "Random Forests" as a [[trademark]] in 2006 ({{As of|lc=y|2019}}, owned by [[Minitab|Minitab, Inc.]]).<ref>{{cite web|url=https://trademarks.justia.com/786/42/random-78642027.html|title=RANDOM FORESTS Trademark of Health Care Productivity, Inc. - Registration Number 3185828 - Serial Number 78642027 :: Justia Trademarks}}</ref> The extension combines Breiman's "[[Bootstrap aggregating|bagging]]" idea and random selection of features, introduced first by Ho<ref name="ho1995"/> and later independently by Amit and [[Donald Geman|Geman]]<ref name="amitgeman1997">{{cite journal | last1 = Amit | first1 = Yali | last2 = Geman | first2 = Donald | author-link2 = Donald Geman | name-list-style = vanc | title = Shape quantization and recognition with randomized trees | journal = [[Neural Computation (journal)|Neural Computation]] | year = 1997 | volume = 9 | issue = 7 | pages = 1545–1588 | doi = 10.1162/neco.1997.9.7.1545 | url = http://www.cis.jhu.edu/publications/papers_in_database/GEMAN/shape.pdf | citeseerx = 10.1.1.57.6069 | s2cid = 12470146 | access-date = 2008-04-01 | archive-date = 2018-02-05 | archive-url = https://web.archive.org/web/20180205094828/http://www.cis.jhu.edu/publications/papers_in_database/GEMAN/shape.pdf | url-status = dead }}</ref> in order to construct a collection of decision trees with controlled variance.