Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Statistical learning theory
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Formal description== Take <math>X</math> to be the [[vector space]] of all possible inputs, and <math>Y</math> to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown [[probability distribution]] over the product space <math>Z = X \times Y</math>, i.e. there exists some unknown <math>p(z) = p(\mathbf{x},y)</math>. The training set is made up of <math>n</math> samples from this probability distribution, and is notated <math display="block">S = \{(\mathbf{x}_1,y_1), \dots ,(\mathbf{x}_n,y_n)\} = \{\mathbf{z}_1, \dots ,\mathbf{z}_n\}</math> Every <math>\mathbf{x}_i</math> is an input vector from the training data, and <math>y_i</math> is the output that corresponds to it. In this formalism, the inference problem consists of finding a function <math>f: X \to Y</math> such that <math>f(\mathbf{x}) \sim y</math>. Let <math>\mathcal{H}</math> be a space of functions <math>f: X \to Y</math> called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let <math>V(f(\mathbf{x}),y)</math> be the [[loss function]], a metric for the difference between the predicted value <math>f(\mathbf{x})</math> and the actual value <math>y</math>. The [[expected risk]] is defined to be <math display="block">I[f] = \int_{X \times Y} V(f(\mathbf{x}),y)\, p(\mathbf{x},y) \,d\mathbf{x} \,dy</math> The target function, the best possible function <math>f</math> that can be chosen, is given by the <math>f</math> that satisfies <math display="block">f = \mathop{\operatorname{argmin}}_{h \in \mathcal{H}} I[h]</math> Because the probability distribution <math>p(\mathbf{x},y)</math> is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the [[empirical risk]] <math display="block">I_S[f] = \frac{1}{n} \sum_{i=1}^n V( f(\mathbf{x}_i),y_i)</math> A learning algorithm that chooses the function <math>f_S</math> that minimizes the empirical risk is called [[empirical risk minimization]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)