Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bootstrap aggregating
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Random Forests === The next part of the algorithm involves introducing yet another element of variability amongst the bootstrapped trees. In addition to each tree only examining a bootstrapped set of samples, only a small but consistent number of unique features are considered when ranking them as classifiers. This means that each tree only knows about the data pertaining to a small constant number of features, and a variable number of samples that is less than or equal to that of the original dataset. Consequently, the trees are more likely to return a wider array of answers, derived from more diverse knowledge. This results in a [[random forest]], which possesses numerous benefits over a single decision tree generated without randomness. In a random forest, each tree "votes" on whether or not to classify a sample as positive based on its features. The sample is then classified based on majority vote. An example of this is given in the diagram below, where the four trees in a random forest vote on whether or not a patient with mutations A, B, F, and G has cancer. Since three out of four trees vote yes, the patient is then classified as cancer positive. [[File:Random Forest Diagram Extra Wide.png|center|frameless|1035x1035px]] Because of their properties, random forests are considered one of the most accurate data mining algorithms, are less likely to [[Overfitting|overfit]] their data, and run quickly and efficiently even for large datasets.<ref>{{Cite web|title=Random forests - classification description|url=https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm|access-date=2021-12-09|website=stat.berkeley.edu}}</ref> They are primarily useful for classification as opposed to [[Regression analysis|regression]], which attempts to draw observed connections between statistical variables in a dataset. This makes random forests particularly useful in such fields as banking, healthcare, the stock market, and [[e-commerce]] where it is important to be able to predict future results based on past data.<ref name=":4">{{Cite web|title=Introduction to Random Forest in Machine Learning|url=https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/|access-date=2021-12-09|website=Engineering Education (EngEd) Program {{!}} Section}}</ref> One of their applications would be as a useful tool for predicting cancer based on genetic factors, as seen in the above example. There are several important factors to consider when designing a random forest. If the trees in the random forests are too deep, overfitting can still occur due to over-specificity. If the forest is too large, the algorithm may become less efficient due to an increased runtime. Random forests also do not generally perform well when given sparse data with little variability.<ref name=":4" /> However, they still have numerous advantages over similar data classification algorithms such as [[neural network]]s, as they are much easier to interpret and generally require less data for training.{{citation needed|date=June 2024}} As an integral component of random forests, bootstrap aggregating is very important to classification algorithms, and provides a critical element of variability that allows for increased accuracy when analyzing new data, as discussed below.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)