Editing Random forest (section)

===Bagging===
{{main|Bootstrap aggregating}}
[[File:Random Forest Bagging Illustration.png|thumb|Illustration of training a Random Forest model. The training dataset (in this case, of 250 rows and 100 columns) is randomly sampled with replacement ''n'' times. Then, a decision tree is trained on each sample. Finally, for prediction, the results of all ''n'' trees are aggregated to produce a final decision.]]
The training algorithm for random forests applies the general technique of [[bootstrap aggregating]], or bagging, to tree learners. Given a training set {{mvar|X}} = {{mvar|x<sub>1</sub>}}, ..., {{mvar|x<sub>n</sub>}} with responses {{mvar|Y}} = {{mvar|y<sub>1</sub>}}, ..., {{mvar|y<sub>n</sub>}}, bagging repeatedly (''B'' times) selects a [[Sampling (statistics)#Replacement of selected units|random sample with replacement]] of the training set and fits trees to these samples:
{{block indent | em = 1.5 | text =
 For {{mvar|b}} = 1, ..., {{mvar|B}}:
# Sample, with replacement, {{mvar|n}} training examples from {{mvar|X}}, {{mvar|Y}}; call these {{mvar|X<sub>b</sub>}}, {{mvar|Y<sub>b</sub>}}.
# Train a classification or regression tree {{mvar|f<sub>b</sub>}} on {{mvar|X<sub>b</sub>}}, {{mvar|Y<sub>b</sub>}}.
}}

After training, predictions for unseen samples {{mvar|x'}} can be made by averaging the predictions from all the individual regression trees on {{mvar|x'}}:

<math display="block">\hat{f} = \frac{1}{B} \sum_{b=1}^Bf_b (x')</math>

or by taking the plurality vote in the case of classification trees.

This bootstrapping procedure leads to better model performance because it decreases the [[Bias–variance dilemma|variance]] of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets.

Additionally, an estimate of the uncertainty of the prediction can be made as the standard deviation of the predictions from all the individual regression trees on {{mvar|x&prime;}}:
<math display="block">\sigma = \sqrt{\frac{\sum_{b=1}^B (f_b(x') - \hat{f})^2}{B-1} }.</math>

The number {{mvar|B}} of samples (equivalently, of trees) is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set. {{mvar|B}} can be optimized using [[Cross-validation (statistics)|cross-validation]], or by observing the ''[[out-of-bag error]]'': the mean prediction error on each training sample {{mvar|x<sub>i</sub>}}, using only the trees that did not have {{mvar|x<sub>i</sub>}} in their bootstrap sample.<ref name="islr">{{cite book |author1=Gareth James |author2=Daniela Witten |author3=Trevor Hastie |author4=Robert Tibshirani |title=An Introduction to Statistical Learning |publisher=Springer |year=2013 |url=http://www-bcf.usc.edu/~gareth/ISL/ |pages=316–321}}</ref>

The training and test error tend to level off after some number of trees have been fit.