Editing Statistical learning theory (section)

==Regularization==
[[File:Overfitting on Training Set Data.pdf|thumb|This image represents an example of overfitting in machine learning. The red dots represent training set data. The green line represents the true functional relationship, while the blue line shows the learned function, which has been overfitted to the training set data.]]

In machine learning problems, a major problem that arises is that of [[overfitting]]. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. [[Empirical risk minimization]] runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well.

Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well.<ref>Vapnik, V.N. and Chervonenkis, A.Y. 1971. [http://ai2-s2-pdfs.s3.amazonaws.com/a36b/028d024bf358c4af1a5e1dc3ca0aed23b553.pdf On the uniform convergence of relative frequencies of events to their probabilities]. ''Theory of Probability and Its Applications'' Vol 16, pp 264-280.</ref><ref>Mukherjee, S., Niyogi, P. Poggio, T., and Rifkin, R. 2006. [https://link.springer.com/article/10.1007/s10444-004-7634-z Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization]. ''Advances in Computational Mathematics''. Vol 25, pp 161-193.</ref> [[Regularization (mathematics)|Regularization]] can solve the overfitting problem and give the problem stability.

Regularization can be accomplished by restricting the hypothesis space <math>\mathcal{H}</math>. A common example would be restricting <math>\mathcal{H}</math> to linear functions: this can be seen as a reduction to the standard problem of [[linear regression]]. <math>\mathcal{H}</math> could also be restricted to polynomial of degree <math>p</math>, exponentials, or bounded functions on [[Lp space|L1]]. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero.

One example of regularization is [[Tikhonov regularization]]. This consists of minimizing
<math display="block">\frac{1}{n} \sum_{i=1}^n V(f(\mathbf{x}_i),y_i) + \gamma \left\|f\right\|_{\mathcal{H}}^2</math>
where <math>\gamma</math> is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution.<ref>Tomaso Poggio, Lorenzo Rosasco, et al. ''Statistical Learning Theory and Applications'', 2012, [https://www.mit.edu/~9.520/spring12/slides/class02/class02.pdf Class 2]</ref>

{{clear}}