Editing Empirical Bayes method (section)

===Parametric empirical Bayes===

If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple [[conjugate prior]]s), then the empirical Bayes problem  is only to estimate the marginal <math>m(y\mid\eta)</math> and the hyperparameters <math>\eta</math> using the complete set of empirical measurements.   For example, one common approach, called parametric empirical Bayes point estimation, is to approximate the marginal using the [[maximum likelihood estimate]] (MLE), or a [[Moment (mathematics)|moments]] expansion, which allows one to express the hyperparameters <math>\eta</math> in terms of the empirical mean and variance.  This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior <math>\theta</math>.  The resulting equation for the prior <math>\theta</math> is greatly simplified, as shown below.

There are several common parametric empirical Bayes models, including the [[Poisson–gamma model]] (below), the [[Beta-binomial model]], the [[Gaussian–Gaussian model]], the [[Dirichlet-multinomial distribution|Dirichlet-multinomial model]], as well specific models for [[Bayesian linear regression]] (see below) and [[Bayesian multivariate linear regression]]. More advanced approaches include [[hierarchical Bayes model]]s and [[Bayesian mixture model]]s.

====Gaussian–Gaussian model====

For an example of empirical Bayes estimation using a Gaussian-Gaussian model, see [[Bayes_estimator#Empirical_Bayes_estimators|Empirical Bayes estimators]].

====Poisson–gamma model====

For example, in the example above, let the likelihood be a [[Poisson distribution]], and let the prior now be specified by the [[conjugate prior]], which is a [[gamma distribution]] (<math>G(\alpha,\beta)</math>) (where <math>\eta = (\alpha,\beta)</math>):

:<math> \rho(\theta\mid\alpha,\beta) \, d\theta = \frac{(\theta/\beta)^{\alpha-1} \, e^{-\theta / \beta} }{\Gamma(\alpha)} \, (d\theta/\beta) \text{ for } \theta > 0, \alpha > 0, \beta > 0 \,\! .</math>

It is straightforward to show the [[Posterior probability|posterior]] is also a gamma distribution.  Write

:<math> \rho(\theta\mid y) \propto \rho(y\mid \theta) \rho(\theta\mid\alpha, \beta) ,</math>

where the marginal distribution has been omitted since it does not depend explicitly on <math>\theta</math>.
Expanding terms which do depend on <math>\theta</math> gives the posterior as:

:<math> \rho(\theta\mid y) \propto  (\theta^y\, e^{-\theta}) (\theta^{\alpha-1}\, e^{-\theta / \beta}) = \theta^{y+ \alpha -1}\, e^{- \theta (1+1 / \beta)} . </math>

So the posterior density is also a [[gamma distribution]] <math>G(\alpha',\beta')</math>, where <math>\alpha' = y + \alpha</math>, and <math>\beta' = (1+1 / \beta)^{-1}</math>.  Also notice that the marginal is simply the integral of the posterior over all <math>\Theta</math>, which turns out to be a [[negative binomial distribution]].

To apply empirical Bayes, we will approximate the marginal using the [[maximum likelihood]] estimate (MLE). But since the posterior is a gamma distribution, the MLE of the marginal turns out to be just the mean of the posterior, which is the point estimate <math>\operatorname{E}(\theta\mid y)</math> we need. Recalling that the mean <math>\mu</math> of a gamma distribution <math>G(\alpha', \beta')</math> is simply <math>\alpha' \beta'</math>, we have

:<math> \operatorname{E}(\theta\mid y) = \alpha' \beta' = \frac{\bar{y}+\alpha}{1+1 / \beta} = \frac{\beta}{1+\beta}\bar{y} + \frac{1}{1+\beta} (\alpha \beta).  </math>

To obtain the values of <math>\alpha</math> and <math>\beta</math>, empirical Bayes prescribes estimating mean <math>\alpha\beta</math> and variance <math>\alpha\beta^2</math> using the complete set of empirical data.

The resulting point estimate <math> \operatorname{E}(\theta\mid y) </math> is therefore like a weighted average of the sample mean <math>\bar{y}</math> and the prior mean <math>\mu = \alpha\beta</math>.  This turns out to be a general feature of empirical Bayes; the point estimates for the prior (i.e. mean) will look like a weighted averages of the sample estimate and the prior estimate (likewise for estimates of the variance).