Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Random forest
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == The general method of random decision forests was first proposed by Salzberg and Heath in 1993,<ref>Heath, D., Kasif, S. and Salzberg, S. (1993). ''k-DT: A multi-tree learning method.'' In ''Proceedings of the Second Intl. Workshop on Multistrategy Learning'', pp. 138-149.</ref> with a method that used a randomized decision tree algorithm to create multiple trees and then combine them using majority voting. This idea was developed further by Ho in 1995.<ref name="ho1995"/> Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected [[Feature (machine learning)|feature]] dimensions. A subsequent work along the same lines<ref name="ho1998"/> concluded that other splitting methods behave similarly, as long as they are randomly forced to be insensitive to some feature dimensions. This observation that a more complex classifier (a larger forest) gets more accurate nearly monotonically is in sharp contrast to the common belief that the complexity of a classifier can only grow to a certain level of accuracy before being hurt by overfitting. The explanation of the forest method's resistance to overtraining can be found in Kleinberg's theory of stochastic discrimination.<ref name="kleinberg1990"/><ref name="kleinberg1996"/><ref name="kleinberg2000"/> The early development of Breiman's notion of random forests was influenced by the work of Amit and Geman<ref name="amitgeman1997"/> who introduced the idea of searching over a random subset of the available decisions when splitting a node, in the context of growing a single [[Decision tree|tree]]. The idea of random subspace selection from Ho<ref name="ho1998"/> was also influential in the design of random forests. This method grows a forest of trees, and introduces variation among the trees by projecting the training data into a randomly chosen [[Linear subspace|subspace]] before fitting each tree or each node. Finally, the idea of randomized node optimization, where the decision at each node is selected by a randomized procedure, rather than a deterministic optimization was first introduced by [[Thomas G. Dietterich]].<ref>{{cite journal | first = Thomas | last = Dietterich | title = An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization | journal = [[Machine Learning (journal)|Machine Learning]] | volume = 40 | issue = 2 | year = 2000 | pages = 139β157 | doi = 10.1023/A:1007607513941 | doi-access = free }}</ref> The proper introduction of random forests was made in a paper by [[Leo Breiman]].<ref name="breiman2001"/> This paper describes a method of building a forest of uncorrelated trees using a [[Classification and regression tree|CART]] like procedure, combined with randomized node optimization and [[Bootstrap aggregating|bagging]]. In addition, this paper combines several ingredients, some previously known and some novel, which form the basis of the modern practice of random forests, in particular: # Using [[out-of-bag error]] as an estimate of the [[generalization error]]. # Measuring variable importance through permutation. The report also offers the first theoretical result for random forests in the form of a bound on the [[generalization error]] which depends on the strength of the trees in the forest and their [[correlation]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)