Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Random forest
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Relationship to nearest neighbors === A relationship between random forests and the [[K-nearest neighbor algorithm|{{mvar|k}}-nearest neighbor algorithm]] ({{mvar|k}}-NN) was pointed out by Lin and Jeon in 2002.<ref name="linjeon02">{{Cite tech report |first1=Yi |last1=Lin |first2=Yongho |last2=Jeon |title=Random forests and adaptive nearest neighbors |series=Technical Report No. 1055 |year=2002 |institution=University of Wisconsin |citeseerx=10.1.1.153.9168}}</ref> Both can be viewed as so-called ''weighted neighborhoods schemes''. These are models built from a training set <math>\{(x_i, y_i)\}_{i=1}^n</math> that make predictions <math>\hat{y}</math> for new points {{mvar|x'}} by looking at the "neighborhood" of the point, formalized by a weight function {{mvar|W}}:<math display="block">\hat{y} = \sum_{i=1}^n W(x_i, x') \, y_i.</math>Here, <math>W(x_i, x')</math> is the non-negative weight of the {{mvar|i}}'th training point relative to the new point {{mvar|x'}} in the same tree. For any {{mvar|x'}}, the weights for points <math>x_i</math> must sum to 1. Weight functions are as follows: * In {{mvar|k}}-NN, <math>W(x_i, x') = \frac{1}{k}</math> if {{mvar|x<sub>i</sub>}} is one of the {{mvar|k}} points closest to {{mvar|x'}}, and zero otherwise. * In a tree, <math>W(x_i, x') = \frac{1}{k'}</math> if {{mvar|x<sub>i</sub>}} is one of the {{mvar|k'}} points in the same leaf as {{mvar|x'}}, and zero otherwise. Since a forest averages the predictions of a set of {{mvar|m}} trees with individual weight functions <math>W_j</math>, its predictions are<math display="block">\hat{y} = \frac{1}{m}\sum_{j=1}^m\sum_{i=1}^n W_{j}(x_i, x') \, y_i = \sum_{i=1}^n\left(\frac{1}{m}\sum_{j=1}^m W_{j}(x_i, x')\right) \, y_i.</math> This shows that the whole forest is again a weighted neighborhood scheme, with weights that average those of the individual trees. The neighbors of {{mvar|x'}} in this interpretation are the points <math>x_i</math> sharing the same leaf in any tree <math>j</math>. In this way, the neighborhood of {{mvar|x'}} depends in a complex way on the structure of the trees, and thus on the structure of the training set. Lin and Jeon show that the shape of the neighborhood used by a random forest adapts to the local importance of each feature.<ref name="linjeon02"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)