Editing Loss function (section)

==Selecting a loss function==
Sound statistical practice requires selecting an estimator consistent with the actual acceptable variation experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances.<ref>{{cite book |last=Pfanzagl |first=J. |year=1994 |title=Parametric Statistical Theory |location=Berlin |publisher=Walter de Gruyter |isbn=978-3-11-013863-4 }}</ref>

A common example involves estimating "[[location parameter|location]]". Under typical statistical assumptions, the [[mean]] or average is the statistic for estimating location that minimizes the expected loss experienced under the [[least squares|squared-error]] loss function, while the [[median]] is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.

In economics, when an agent is [[risk neutral]], the objective function is simply expressed as the expected value of a monetary quantity, such as profit, income, or end-of-period wealth. For [[Risk aversion|risk-averse]] or [[risk-loving]] agents, loss is measured as the negative of a [[utility|utility function]], and the objective function to be optimized is the expected value of utility.

Other measures of cost are possible, for example [[Mortality rate|mortality]] or [[morbidity]] in the field of [[public health]] or [[safety engineering]].

For most [[optimization algorithm]]s, it is desirable to have a loss function that is globally [[Continuous function|continuous]] and [[Differentiable function|differentiable]].

Two very commonly used loss functions are the [[mean squared error|squared loss]], <math>L(a) = a^2</math>, and the [[absolute deviation|absolute loss]], <math>L(a)=|a|</math>.  However the absolute loss has the disadvantage that it is not differentiable at <math>a=0</math>.  The squared loss has the disadvantage that it has the tendency to be dominated by [[outlier]]s—when summing over a set of <math>a</math>'s (as in <math display="inline">\sum_{i=1}^n L(a_i) </math>), the final sum tends to be the result of a few particularly large ''a''-values, rather than an expression of the average ''a''-value.

The choice of a loss function is not arbitrary. It is very restrictive and sometimes the loss function may be characterized by its desirable properties.<ref>Detailed information on mathematical principles of the loss function choice is given in Chapter 2 of the book {{cite book|title=Robust and Non-Robust Models in Statistics|first1=B.|last1=Klebanov|first2=Svetlozat T.|last2=Rachev|first3=Frank J.|last3=Fabozzi|publisher=Nova Scientific Publishers, Inc.|location=New York|year=2009}} (and references there).</ref> Among  the choice principles are, for example, the requirement of completeness of the class of symmetric statistics in the case of [[i.i.d.]] observations, the principle of complete information, and some others.

[[W. Edwards Deming]] and [[Nassim Nicholas Taleb]] argue that empirical reality, not nice mathematical properties, should be the sole basis for selecting loss functions, and real losses often are not mathematically nice and are not differentiable, continuous, symmetric, etc. For example, a person who arrives before a plane gate closure can still make the plane, but a person who arrives after can not, a discontinuity and asymmetry which makes arriving slightly late much more costly than arriving slightly early. In drug dosing, the cost of too little drug may be lack of efficacy, while the cost of too much may be tolerable toxicity, another example of asymmetry. Traffic, pipes, beams, ecologies, climates, etc. may tolerate increased load or stress with little noticeable change up to a point, then become backed up or break catastrophically. These situations, Deming and Taleb argue, are common in real-life problems, perhaps more common than classical smooth, continuous, symmetric, differentials cases.<ref>{{Cite book|title=Out of the Crisis|last=Deming|first=W. Edwards|publisher=The MIT Press|year=2000|isbn=9780262541152}}</ref>