Editing Maximum likelihood estimation (section)

==== Application of maximum-likelihood estimation in Bayes decision theory ====
In many practical applications in [[machine learning]], maximum-likelihood estimation is used as the model for parameter estimation.

The Bayesian Decision theory is about designing a classifier that minimizes total expected risk, especially, when the costs (the loss function) associated with different decisions are equal, the classifier is minimizing the error over the whole distribution.<ref>{{cite web |last=Christensen |first=Henrikt I. |title=Pattern Recognition |publisher=Georgia Tech |series=Bayesian Decision Theory - CS 7616 |url=https://www.cc.gatech.edu/~hic/CS7616/pdf/lecture2.pdf |type=lecture}}</ref>

Thus, the Bayes Decision Rule is stated as
:"decide <math>\;w_1\;</math> if <math>~\operatorname{\mathbb P}(w_1|x) \; > \; \operatorname{\mathbb P}(w_2|x)~;~</math> otherwise decide <math>\;w_2\;</math>"
where <math>\;w_1\,, w_2\;</math> are predictions of different classes. From a perspective of minimizing error, it can also be stated as
<math display="block">w = \underset{ w }{\operatorname{arg\;max}} \; \int_{-\infty}^\infty \operatorname{\mathbb P}(\text{ error}\mid x)\operatorname{\mathbb P}(x)\,\operatorname{d}x~</math>
where
<math display="block">\operatorname{\mathbb P}(\text{ error}\mid x) = \operatorname{\mathbb P}(w_1\mid x)~</math>
if we decide <math>\;w_2\;</math> and <math>\;\operatorname{\mathbb P}(\text{ error}\mid x) = \operatorname{\mathbb P}(w_2\mid x)\;</math> if we decide <math>\;w_1\;.</math>

By applying [[Bayes' theorem]]
<math display="block">\operatorname{\mathbb P}(w_i \mid x) = \frac{\operatorname{\mathbb P}(x \mid w_i) \operatorname{\mathbb P}(w_i)}{\operatorname{\mathbb P}(x)}</math>,
and if we further assume the zero-or-one loss function, which is a same loss for all errors, the Bayes Decision rule can be reformulated as:
<math display="block">h_\text{Bayes} = \underset{ w }{\operatorname{arg\;max}} \, \bigl[\, \operatorname{\mathbb P}(x\mid w)\,\operatorname{\mathbb P}(w) \,\bigr]\;,</math>
where <math>h_\text{Bayes}</math> is the prediction and <math>\;\operatorname{\mathbb P}(w)\;</math> is the [[prior probability]].