Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Loss function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Selecting a loss function== Sound statistical practice requires selecting an estimator consistent with the actual acceptable variation experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances.<ref>{{cite book |last=Pfanzagl |first=J. |year=1994 |title=Parametric Statistical Theory |location=Berlin |publisher=Walter de Gruyter |isbn=978-3-11-013863-4 }}</ref> A common example involves estimating "[[location parameter|location]]". Under typical statistical assumptions, the [[mean]] or average is the statistic for estimating location that minimizes the expected loss experienced under the [[least squares|squared-error]] loss function, while the [[median]] is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances. In economics, when an agent is [[risk neutral]], the objective function is simply expressed as the expected value of a monetary quantity, such as profit, income, or end-of-period wealth. For [[Risk aversion|risk-averse]] or [[risk-loving]] agents, loss is measured as the negative of a [[utility|utility function]], and the objective function to be optimized is the expected value of utility. Other measures of cost are possible, for example [[Mortality rate|mortality]] or [[morbidity]] in the field of [[public health]] or [[safety engineering]]. For most [[optimization algorithm]]s, it is desirable to have a loss function that is globally [[Continuous function|continuous]] and [[Differentiable function|differentiable]]. Two very commonly used loss functions are the [[mean squared error|squared loss]], <math>L(a) = a^2</math>, and the [[absolute deviation|absolute loss]], <math>L(a)=|a|</math>. However the absolute loss has the disadvantage that it is not differentiable at <math>a=0</math>. The squared loss has the disadvantage that it has the tendency to be dominated by [[outlier]]sβwhen summing over a set of <math>a</math>'s (as in <math display="inline">\sum_{i=1}^n L(a_i) </math>), the final sum tends to be the result of a few particularly large ''a''-values, rather than an expression of the average ''a''-value. The choice of a loss function is not arbitrary. It is very restrictive and sometimes the loss function may be characterized by its desirable properties.<ref>Detailed information on mathematical principles of the loss function choice is given in Chapter 2 of the book {{cite book|title=Robust and Non-Robust Models in Statistics|first1=B.|last1=Klebanov|first2=Svetlozat T.|last2=Rachev|first3=Frank J.|last3=Fabozzi|publisher=Nova Scientific Publishers, Inc.|location=New York|year=2009}} (and references there).</ref> Among the choice principles are, for example, the requirement of completeness of the class of symmetric statistics in the case of [[i.i.d.]] observations, the principle of complete information, and some others. [[W. Edwards Deming]] and [[Nassim Nicholas Taleb]] argue that empirical reality, not nice mathematical properties, should be the sole basis for selecting loss functions, and real losses often are not mathematically nice and are not differentiable, continuous, symmetric, etc. For example, a person who arrives before a plane gate closure can still make the plane, but a person who arrives after can not, a discontinuity and asymmetry which makes arriving slightly late much more costly than arriving slightly early. In drug dosing, the cost of too little drug may be lack of efficacy, while the cost of too much may be tolerable toxicity, another example of asymmetry. Traffic, pipes, beams, ecologies, climates, etc. may tolerate increased load or stress with little noticeable change up to a point, then become backed up or break catastrophically. These situations, Deming and Taleb argue, are common in real-life problems, perhaps more common than classical smooth, continuous, symmetric, differentials cases.<ref>{{Cite book|title=Out of the Crisis|last=Deming|first=W. Edwards|publisher=The MIT Press|year=2000|isbn=9780262541152}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)