Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Random forest
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===From bagging to random forests=== {{main|Random subspace method}} The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a [[Random subspace method|random subset of the features]]. This process is sometimes called "feature bagging". The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few [[Feature (machine learning)|features]] are very strong predictors for the response variable (target output), these features will be selected in many of the {{mvar|B}} trees, causing them to become correlated. An analysis of how bagging and random subspace projection contribute to accuracy gains under different conditions is given by Ho.<ref name="ho2002">{{cite journal | first = Tin Kam | last = Ho | title = A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors | journal = Pattern Analysis and Applications | volume = 5 | issue = 2 | year = 2002 | pages = 102β112 | url = http://ect.bell-labs.com/who/tkh/publications/papers/compare.pdf | doi = 10.1007/s100440200009 | s2cid = 7415435 | access-date = 2015-11-13 | archive-date = 2016-04-17 | archive-url = https://web.archive.org/web/20160417091232/http://ect.bell-labs.com/who/tkh/publications/papers/compare.pdf | url-status = dead }}</ref> Typically, for a classification problem with {{mvar|p}} features, {{sqrt|{{mvar|p}}}} (rounded down) features are used in each split.<ref name="elemstatlearn"/>{{rp|p=592}} For regression problems the inventors recommend {{math|''p''/3}} (rounded down) with a minimum node size of 5 as the default.<ref name="elemstatlearn"/>{{rp|p=592}} In practice, the best values for these parameters should be tuned on a case-to-case basis for every problem.<ref name="elemstatlearn"/>{{rp|592}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)