Editing Bayesian inference (section)

==Inference over exclusive and exhaustive possibilities==
If evidence is simultaneously used to update belief over a set of exclusive and exhaustive propositions, Bayesian inference may be thought of as acting on this belief distribution as a whole.

===General formulation===
[[File:Bayesian inference event space.svg|thumb|Diagram illustrating event space <math>\Omega</math> in general formulation of Bayesian inference. Although this diagram shows discrete models and events, the continuous case may be visualized similarly using probability densities.]]

<!-- This section is not clear as it now stands. -->
Suppose a process is generating independent and identically distributed events <math>E_n,\ n = 1, 2, 3, \ldots</math>, but the [[probability distribution]] is unknown. Let the event space <math>\Omega</math> represent the current state of belief for this process. Each model is represented by event <math>M_m</math>. The conditional probabilities <math>P(E_n \mid M_m)</math> are specified to define the models. <math>P(M_m)</math> is the [[Credence (statistics)|degree of belief]] in <math>M_m</math>. Before the first inference step, <math>\{P(M_m)\}</math> is a set of ''initial prior probabilities''. These must sum to 1, but are otherwise arbitrary.

Suppose that the process is observed to generate <math>E \in \{E_n\}</math>. For each <math>M \in \{M_m\}</math>, the prior <math>P(M)</math> is updated to the posterior <math>P(M \mid E)</math>. From [[Bayes' theorem]]:<ref>Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B. (2013). ''Bayesian Data Analysis'', Third Edition. Chapman and Hall/CRC. {{ISBN|978-1-4398-4095-5}}.</ref>

<math display="block">P(M \mid E) = \frac{P(E \mid M)}{\sum_m {P(E \mid M_m) P(M_m)}} \cdot P(M).</math>

Upon observation of further evidence, this procedure may be repeated.

===Multiple observations===

For a sequence of [[independent and identically distributed]] observations <math>\mathbf{E} = (e_1, \dots, e_n)</math>, it can be shown by induction that repeated application of the above is equivalent to
<math display="block">P(M \mid \mathbf{E}) = \frac{P(\mathbf{E} \mid M)}{\sum_m {P(\mathbf{E} \mid M_m) P(M_m)}} \cdot P(M),</math>
where
<math display="block">P(\mathbf{E} \mid M) = \prod_k{P(e_k \mid M)}.</math>
<!-- It may be more informative if an actual example is given: e1/M, e2/M, ... might be shown as .05/4, .061/4, .033/4.  Then showing the actual calculations using these three terms in the summation. -->

===Parametric formulation: motivating the formal description===

By parameterizing the space of models, the belief in all models may be updated in a single step. The distribution of belief over the model space may then be thought of as a distribution of belief over the parameter space. The distributions in this section are expressed as continuous, represented by probability densities, as this is the usual situation. The technique is, however, equally applicable to discrete distributions.

Let the vector <math>\boldsymbol{\theta}</math> span the parameter space. Let the initial prior distribution over <math>\boldsymbol{\theta}</math> be <math>p(\boldsymbol{\theta} \mid \boldsymbol{\alpha})</math>, where <math>\boldsymbol{\alpha}</math> is a set of parameters to the prior itself, or ''[[Hyperparameter (Bayesian statistics)|hyperparameter]]s''. Let <math>\mathbf{E} = (e_1, \dots, e_n)</math> be a sequence of [[Independent and identically distributed random variables|independent and identically distributed]] event observations, where all <math>e_i</math> are distributed as <math>p(e \mid \boldsymbol{\theta})</math> for some <math>\boldsymbol{\theta}</math>. [[Bayes' theorem]] is applied to find the [[posterior distribution]] over <math>\boldsymbol{\theta}</math>:

<math display="block">\begin{align}
 p(\boldsymbol{\theta} \mid \mathbf{E}, \boldsymbol{\alpha}) &= \frac{p(\mathbf{E} \mid \boldsymbol{\theta}, \boldsymbol{\alpha})}{p(\mathbf{E} \mid \boldsymbol{\alpha})} \cdot p(\boldsymbol{\theta} \mid \boldsymbol{\alpha}) \\
  &= \frac{p(\mathbf{E} \mid \boldsymbol{\theta}, \boldsymbol{\alpha})}{\int p(\mathbf{E} \mid \boldsymbol{\theta}, \boldsymbol{\alpha}) p(\boldsymbol{\theta} \mid \boldsymbol{\alpha}) \, d\boldsymbol{\theta}} \cdot p(\boldsymbol{\theta} \mid \boldsymbol{\alpha}),
\end{align}</math>
where
<math display="block">p(\mathbf{E} \mid \boldsymbol{\theta}, \boldsymbol{\alpha}) = \prod_k p(e_k \mid \boldsymbol{\theta}).</math>