Editing Loss function (section)

==Expected loss==
{{See also|Empirical risk minimization}}
In some contexts, the value of the loss function itself is a random quantity because it depends on the outcome of a random variable ''X''. 

===Statistics===
Both [[Frequentist probability|frequentist]] and [[Bayesian probability|Bayesian]] statistical theory involve making a decision based on the [[expected value]] of the loss function; however, this quantity is defined differently under the two paradigms.

====Frequentist expected loss====
We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the [[probability distribution]], ''P''<sub>''θ''</sub>, of the observed data, ''X''. This is also referred to as the '''risk function'''<ref>{{SpringerEOM| title=Risk of a statistical procedure |id=R/r082490 |first=M.S. |last=Nikulin}}</ref><ref>
{{cite book
 |title=Statistical decision theory and Bayesian Analysis
 |first=James O. |last=Berger |author-link=James Berger (statistician)
 |year=1985
 |edition=2nd
 |publisher=Springer-Verlag |location=New York
 |isbn=978-0-387-96098-2 |mr=0804611
|url=https://books.google.com/books?id=oY_x7dE15_AC
|bibcode=1985sdtb.book.....B }}</ref><ref>{{cite book
 |first=Morris |last=DeGroot |author-link=Morris H. DeGroot
 |title=Optimal Statistical Decisions
 |publisher=Wiley Classics Library |year=2004 |orig-year=1970
 |isbn=978-0-471-68029-1 |mr=2288194
}}</ref><ref>{{cite book
 |last=Robert |first=Christian P.
 |title=The Bayesian Choice
 |publisher=Springer |location=New York
 |year=2007|edition=2nd
 |doi=10.1007/0-387-71599-1 |isbn=978-0-387-95231-4 |mr=1835885
|series=Springer Texts in Statistics
 }}</ref> of the decision rule ''δ'' and the parameter ''θ''. Here the decision rule depends on the outcome of ''X''. The risk function is given by:

: <math>R(\theta, \delta) = \operatorname{E}_\theta L\big( \theta, \delta(X) \big) = \int_X L\big( \theta, \delta(x) \big) \, \mathrm{d} P_\theta (x) .</math>

Here, ''θ'' is a fixed but possibly unknown state of nature, ''X'' is a vector of observations stochastically drawn from a [[Statistical population|population]], <math>\operatorname{E}_\theta</math> is the expectation over all population values of ''X'', ''dP''<sub>''θ''</sub> is a [[probability measure]] over the event space of ''X'' (parametrized by&nbsp;''θ'') and the integral is evaluated over the entire [[Support (measure theory)|support]] of&nbsp;''X''.

====Bayes Risk ====
In a Bayesian approach, the expectation is calculated using the [[prior distribution]] {{pi}}<sup>*</sup> of the parameter&nbsp;''θ'':

:<math>\rho(\pi^*,a) = \int_\Theta \int _{\bold X} L(\theta, a(\bold x)) \, \mathrm{d} P(\bold x \vert \theta) \,\mathrm{d} \pi^* (\theta)= \int_{\bold X} \int_\Theta L(\theta,a(\bold x))\,\mathrm{d} \pi^*(\theta\vert \bold x)\,\mathrm{d}M(\bold x)</math>

where m(x) is known as the ''predictive likelihood'' wherein θ has been "integrated out," {{pi}}<sup>*</sup> (θ | x) is the posterior distribution, and the order of integration has been changed. One then should choose the action ''a<sup>*</sup>'' which minimises this expected loss, which is referred to as ''Bayes Risk''.  
In the latter equation, the integrand inside dx is known as the ''Posterior Risk'', and minimising it with respect to decision ''a'' also minimizes the overall Bayes Risk. This optimal decision, ''a<sup>*</sup>'' is known as the ''Bayes (decision) Rule'' - it minimises the average loss over all possible states of nature θ, over all possible (probability-weighted) data outcomes. One advantage of the Bayesian approach is to that one need only choose the optimal action under the actual observed data to obtain a uniformly optimal one, whereas choosing the actual frequentist optimal decision rule as a function of all possible observations, is a much more difficult problem.  Of equal importance though, the Bayes Rule reflects consideration of loss outcomes under different states of nature, θ.

====Examples in statistics====
* For a scalar parameter ''θ'', a decision function whose output <math>\hat\theta</math> is an estimate of&nbsp;''θ'', and a quadratic loss function ([[squared error loss]]) <math display="block"> L(\theta,\hat\theta)=(\theta-\hat\theta)^2,</math> the risk function becomes the [[mean squared error]] of the estimate, <math display="block">R(\theta,\hat\theta)= \operatorname{E}_\theta \left [ (\theta-\hat\theta)^2 \right ].</math>An [[Estimator]] found by minimizing the [[Mean squared error]] estimates the [[Posterior distribution]]'s mean.
* In [[density estimation]], the unknown parameter is [[probability density function|probability density]] itself. The loss function is typically chosen to be a [[Norm (mathematics)|norm]] in an appropriate [[function space]]. For example, for [[L2 norm|''L''<sup>2</sup> norm]], <math display="block">L(f,\hat f) = \|f-\hat f\|_2^2\,,</math> the risk function becomes the [[mean integrated squared error]] <math display="block">R(f,\hat f)=\operatorname{E} \left ( \|f-\hat f\|^2 \right ).\,</math>

===Economic choice under uncertainty===

In economics, decision-making under uncertainty is often modelled using the [[von Neumann–Morgenstern utility function]] of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized.