Editing Bayesian network (section)

==Statistical introduction==
{{Main|Bayesian statistics|Multilevel model}}
Given data <math>x\,\!</math> and parameter <math>\theta</math>, a simple [[Bayesian statistics|Bayesian analysis]] starts with a [[prior probability]] (''prior'') <math>p(\theta)</math> and [[likelihood function|likelihood]] <math>p(x\mid\theta)</math> to compute a [[posterior probability]] <math>p(\theta\mid x) \propto p(x\mid\theta)p(\theta)</math>.

Often the prior on <math>\theta</math> depends in turn on other parameters <math>\varphi</math> that are not mentioned in the likelihood. So, the prior <math>p(\theta)</math> must be replaced by a likelihood <math>p(\theta\mid \varphi)</math>, and a prior <math>p(\varphi)</math> on the newly introduced parameters <math>\varphi</math> is required, resulting in a posterior probability

: <math>p(\theta,\varphi\mid x) \propto p(x\mid\theta)p(\theta\mid\varphi)p(\varphi).</math>

This is the simplest example of a [[Bayesian hierarchical modeling|''hierarchical Bayes model'']].

The process may be repeated; for example, the parameters <math>\varphi</math> may depend in turn on additional parameters <math>\psi\,\!</math>, which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.

===Introductory examples===
{{Expand section|date=March 2009|reason=More examples needed}}

Given the measured quantities <math>x_1,\dots,x_n\,\!</math>each with [[Normal distribution|normally distributed]] errors of known [[standard deviation]] <math>\sigma\,\!</math>,

: <math>
x_i \sim N(\theta_i, \sigma^2)
</math>

Suppose we are interested in estimating the <math>\theta_i</math>. An approach would be to estimate the <math>\theta_i</math> using a [[maximum likelihood]] approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply

: <math>
\theta_i = x_i.
</math>

However, if the quantities are related, so that for example the individual <math>\theta_i</math>have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,

: <math>
x_i \sim N(\theta_i,\sigma^2),
</math>
: <math>
\theta_i\sim N(\varphi, \tau^2),
</math>

with [[improper prior]]s <math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>. When <math>n\ge 3</math>, this is an ''identified model'' (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual <math>\theta_i</math> will tend to move, or ''[[Shrinkage estimator|shrink]]'' away from the maximum likelihood estimates towards their common mean. This ''shrinkage'' is a typical behavior in hierarchical Bayes models.

===Restrictions on priors===
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the [[Jeffreys prior]] often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the [[Loss function#Expected loss|expected loss]] will be [[admissible decision rule|inadmissible]].