Editing Bayesian inference (section)

===Formal explanation===
{| class="wikitable floatright" style="font-size:100%;"
|+ [[Contingency table]]
! {{diagonal split header|<br /><br />Evidence|Hypothesis}} !! Satisfies<br />hypothesis<br />{{mvar|H}} !! Violates<br />hypothesis<br />{{tmath|\neg H}} !! rowspan="5" style="padding:0;"| !! <br />Total
|-
! Has evidence<br />{{mvar|E}}
| |<math>P(H|E)\cdot P(E)</math><br /><math>= P(E|H)\cdot P(H)</math> || |<math>P(\neg H|E)\cdot P(E)</math><br /><math>= P(E|\neg H)\cdot P(\neg H)</math> || {{tmath|P(E)}}
|-
! No evidence<br />{{tmath|\neg E}}
| nowrap|<math>P(H|\neg E)\cdot P(\neg E)</math><br /><math>= P(\neg E|H)\cdot P(H)</math> || nowrap|<math>P(\neg H|\neg E)\cdot P(\neg E)</math><br /><math>= P(\neg E|\neg H)\cdot P(\neg H)</math> || nowrap|<math>P(\neg E)</math>=<br /><math>1-P(E)</math>
|-
| colspan="5" style="padding:0;"|
|-
! Total
| &nbsp;&nbsp; {{tmath|P(H)}} || style="text-align:right;" nowrap|<math>P(\neg H) = 1-P(H)</math> || style="text-align:center;"|1
|}
Bayesian inference derives the [[posterior probability]] as a [[consequence relation|consequence]] of two [[Antecedent (logic)|antecedent]]s: a [[prior probability]] and a "[[likelihood function]]" derived from a [[statistical model]] for the observed data. Bayesian inference computes the posterior probability according to [[Bayes' theorem]]:
<math display="block">P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)},</math>
where
* {{mvar|H}} stands for any ''hypothesis'' whose probability may be affected by [[Experimental data|data]] (called ''evidence'' below). Often there are competing hypotheses, and the task is to determine which is the most probable.
* <math>P(H)</math>, the ''[[prior probability]]'', is the estimate of the probability of the hypothesis {{mvar|H}} ''before'' the data {{mvar|E}}, the current evidence, is observed.
* {{mvar|E}}, the ''evidence'', corresponds to new data that were not used in computing the prior probability.
* <math>P(H \mid E)</math>, the ''[[posterior probability]]'', is the probability of {{mvar|H}} ''given'' {{mvar|E}}, i.e., ''after'' {{mvar|E}} is observed.  This is what we want to know: the probability of a hypothesis ''given'' the observed evidence.
* <math>P(E \mid H)</math> is the probability of observing {{mvar|E}} ''given'' {{mvar|H}} and is called the ''[[Likelihood function|likelihood]]''. As a function of {{mvar|E}} with {{mvar|H}} fixed, it indicates the compatibility of the evidence with the given hypothesis. The likelihood function is a function of the evidence, {{mvar|E}}, while the posterior probability is a function of the hypothesis, {{mvar|H}}.
* <math>P(E)</math> is sometimes termed the [[marginal likelihood]] or "model evidence". This factor is the same for all possible hypotheses being considered (as is evident from the fact that the hypothesis {{mvar|H}} does not appear anywhere in the symbol, unlike for all the other factors) and hence does not factor into determining the relative probabilities of different hypotheses.
*<math>P(E)>0</math> (Else one has <math>0/0</math>.)

For different values of {{mvar|H}}, only the factors <math>P(H)</math> and <math>P(E \mid H)</math>, both in the numerator, affect the value of <math>P(H \mid E)</math>{{snd}} the posterior probability of a hypothesis is proportional to its prior probability (its inherent likeliness) and the newly acquired likelihood (its compatibility with the new observed evidence).

In cases where <math>\neg H</math> ("not {{mvar|H}}"), the [[logical negation]] of {{mvar|H}}, is a valid likelihood, Bayes' rule can be rewritten as follows:
<math display="block">\begin{align}
 P(H \mid E) &= \frac{P(E \mid H) P(H)}{P(E)} \\ \\
             &= \frac{P(E \mid H) P(H)}{P(E \mid H) P(H) + P(E \mid \neg H) P(\neg H)} \\ \\
             &= \frac{1}{1 + \left(\frac{1}{P(H)} - 1\right) \frac{P(E \mid \neg H)}{P(E \mid H)} } \\
\end{align}</math>
because
<math display="block"> P(E) = P(E \mid H) P(H) + P(E \mid \neg H) P(\neg H) </math>
and
<math display="block"> P(H) + P(\neg H) = 1 .</math>  This focuses attention on the term <math display="block"> \left(\tfrac{1}{P(H)} - 1\right) \tfrac{P(E \mid \neg H)}{P(E \mid H)} .</math>  If that term is approximately 1, then the probability of the hypothesis given the evidence, <math> P(H \mid E) </math>, is about <math>\tfrac{1}{2}</math>, about 50% likely - equally likely or not likely.  If that term is very small, close to zero, then the probability of the hypothesis, given the evidence, <math> P(H \mid E) </math> is close to 1 or the conditional hypothesis is quite likely.  If that term is very large, much larger than 1, then the hypothesis, given the evidence, is quite unlikely.  If the hypothesis (without consideration of evidence) is unlikely, then <math>P(H)</math> is small (but not necessarily astronomically small) and <math>\tfrac{1}{P(H)}</math> is much larger than 1 and this term can be approximated as <math>\tfrac{P(E \mid \neg H)}{P(E \mid H) \cdot P(H)} </math> and relevant probabilities can be compared directly to each other.

One quick and easy way to remember the equation would be to use [[Conditional probability#As an axiom of probability|rule of multiplication]]:
<math display="block">P(E \cap H) = P(E \mid H) P(H) = P(H \mid E) P(E).</math>