Editing Empirical Bayes method (section)

==Introduction==

Empirical Bayes methods can be seen as an approximation to a fully Bayesian treatment of a [[hierarchical Bayes model]].

In, for example, a two-stage hierarchical Bayes model, observed data <math>y = \{y_1, y_2, \dots, y_n\}</math> are assumed to be generated from an unobserved set of parameters <math>\theta = \{\theta_1, \theta_2, \dots, \theta_n\}</math> according to a probability distribution <math>p(y\mid\theta)\,</math>.  In turn, the parameters <math>\theta</math> can be considered samples drawn from a population characterised by [[Hyperparameter (Bayesian statistics)|hyperparameters]] <math>\eta\,</math> according to a probability distribution <math>p(\theta\mid\eta)\,</math>.  In the hierarchical Bayes model, though not in the empirical Bayes approximation, the hyperparameters <math>\eta\,</math> are considered to be drawn from an unparameterized distribution <math>p(\eta)\,</math>.

Information about a particular quantity of interest <math>\theta_i\;</math> therefore comes not only from the properties of those data <math>y</math> that directly depend on it, but also from the properties of the population of parameters <math>\theta\;</math> as a whole, inferred from the data as a whole, summarised by the hyperparameters <math>\eta\;</math>.

Using [[Bayes' theorem]],

:<math>
p(\theta\mid y)
= \frac{p(y \mid \theta) p(\theta)}{p(y)}
= \frac {p(y \mid \theta)}{p(y)} \int  p(\theta \mid \eta) p(\eta) \, d\eta \,.
</math>

In general, this integral will not be tractable [[Integral#Analytical|analytically]] or [[Symbolic integration|symbolically]] and must be evaluated by [[Integral#Numerical|numerical]] methods. Stochastic (random) or deterministic approximations may be used.  Example stochastic methods are [[Markov chain Monte Carlo|Markov Chain Monte Carlo]] and [[Monte Carlo integration|Monte Carlo]] sampling.  Deterministic approximations are discussed in [[numerical integration|quadrature]].

Alternatively, the expression can be written as
:<math>
p(\theta\mid y)
= \int p(\theta\mid\eta, y) p(\eta \mid y) \; d \eta
= \int \frac{p(y \mid \theta) p(\theta \mid \eta)}{p(y \mid \eta)} p(\eta \mid y) \; d \eta\,,
</math>
and the final factor in the integral can in turn be expressed as
:<math>
   p(\eta \mid y) = \int p(\eta \mid \theta) p(\theta \mid y) \; d \theta .
</math>

These suggest an iterative scheme, qualitatively similar in structure to a [[Gibbs sampler]], to evolve successively improved approximations to <math>p(\theta\mid y)\;</math> and <math>p(\eta\mid y)\;</math>.  First, calculate an initial approximation to <math>p(\theta\mid y)\;</math> ignoring the <math>\eta</math> dependence completely; then calculate an approximation to <math>p(\eta\mid y)\;</math> based upon the initial approximate distribution of <math>p(\theta\mid y)\;</math>; then use this <math>p(\eta\mid y)\;</math> to update the approximation for <math>p(\theta\mid y)\;</math>; then update <math>p(\eta\mid y)\;</math>; and so on.

When the true distribution <math>p(\eta\mid y)\;</math> is sharply peaked, the integral determining <math>p(\theta\mid y)\;</math> may be not much changed by replacing the probability distribution over <math>\eta\;</math> with a point estimate <math>\eta^{*}\;</math> representing the distribution's peak (or, alternatively, its mean),
:<math>
   p(\theta\mid y) \simeq \frac{p(y \mid \theta) \; p(\theta \mid \eta^{*})}{p(y \mid \eta^{*})}\,.
</math>
With this approximation, the above iterative scheme becomes the [[EM algorithm]].

The term "Empirical Bayes" can cover a wide variety of methods, but most can be regarded as an early truncation of either the above scheme or something quite like it.  Point estimates, rather than the whole distribution, are typically used for the parameter(s) <math>\eta\;</math>. The estimates for <math>\eta^{*}\;</math> are typically made from the first approximation to <math>p(\theta\mid y)\;</math> without subsequent refinement. These estimates for <math>\eta^{*}\;</math> are usually made without considering an appropriate prior distribution for <math>\eta</math>.