Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Statistical learning theory
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Loss functions== The choice of loss function is a determining factor on the function <math>f_S</math> that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be [[Convex function|convex]].<ref>{{ cite journal | last1 = Rosasco | first1 = Lorenzo | last2 = De Vito | first2 = Ernesto |last3 = Caponnetto | first3 = Andrea | last4 = Piana | first4 = Michele | last5 = Verri | first5 = Alessandro | date = 2004-05-01 | title = Are Loss Functions All the Same? | url = https://direct.mit.edu/neco/article/16/5/1063-1076/6828 |journal = Neural Computation | language = en | volume = 16 | issue = 5 | pages = 1063β1076 | doi = 10.1162/089976604773135104 | pmid = 15070510 |issn = 0899-7667}}</ref> Different loss functions are used depending on whether the problem is one of regression or one of classification. ===Regression=== The most common loss function for regression is the square loss function (also known as the [[L2-norm]]). This familiar loss function is used in [[Ordinary least squares regression|Ordinary Least Squares regression]]. The form is: <math display="block">V(f(\mathbf{x}),y) = (y - f(\mathbf{x}))^2</math> The absolute value loss (also known as the [[L1-norm]]) is also sometimes used: <math display="block">V(f(\mathbf{x}),y) = |y - f(\mathbf{x})|</math> ===Classification=== {{main|Statistical classification}} In some sense the 0-1 [[indicator function]] is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with <math>Y = \{-1, 1\}</math>, this is: <math display="block">V(f(\mathbf{x}),y) = \theta(- y f(\mathbf{x}))</math> where <math>\theta</math> is the [[Heaviside step function]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)