Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Empirical Bayes method
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Introduction== Empirical Bayes methods can be seen as an approximation to a fully Bayesian treatment of a [[hierarchical Bayes model]]. In, for example, a two-stage hierarchical Bayes model, observed data <math>y = \{y_1, y_2, \dots, y_n\}</math> are assumed to be generated from an unobserved set of parameters <math>\theta = \{\theta_1, \theta_2, \dots, \theta_n\}</math> according to a probability distribution <math>p(y\mid\theta)\,</math>. In turn, the parameters <math>\theta</math> can be considered samples drawn from a population characterised by [[Hyperparameter (Bayesian statistics)|hyperparameters]] <math>\eta\,</math> according to a probability distribution <math>p(\theta\mid\eta)\,</math>. In the hierarchical Bayes model, though not in the empirical Bayes approximation, the hyperparameters <math>\eta\,</math> are considered to be drawn from an unparameterized distribution <math>p(\eta)\,</math>. Information about a particular quantity of interest <math>\theta_i\;</math> therefore comes not only from the properties of those data <math>y</math> that directly depend on it, but also from the properties of the population of parameters <math>\theta\;</math> as a whole, inferred from the data as a whole, summarised by the hyperparameters <math>\eta\;</math>. Using [[Bayes' theorem]], :<math> p(\theta\mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} = \frac {p(y \mid \theta)}{p(y)} \int p(\theta \mid \eta) p(\eta) \, d\eta \,. </math> In general, this integral will not be tractable [[Integral#Analytical|analytically]] or [[Symbolic integration|symbolically]] and must be evaluated by [[Integral#Numerical|numerical]] methods. Stochastic (random) or deterministic approximations may be used. Example stochastic methods are [[Markov chain Monte Carlo|Markov Chain Monte Carlo]] and [[Monte Carlo integration|Monte Carlo]] sampling. Deterministic approximations are discussed in [[numerical integration|quadrature]]. Alternatively, the expression can be written as :<math> p(\theta\mid y) = \int p(\theta\mid\eta, y) p(\eta \mid y) \; d \eta = \int \frac{p(y \mid \theta) p(\theta \mid \eta)}{p(y \mid \eta)} p(\eta \mid y) \; d \eta\,, </math> and the final factor in the integral can in turn be expressed as :<math> p(\eta \mid y) = \int p(\eta \mid \theta) p(\theta \mid y) \; d \theta . </math> These suggest an iterative scheme, qualitatively similar in structure to a [[Gibbs sampler]], to evolve successively improved approximations to <math>p(\theta\mid y)\;</math> and <math>p(\eta\mid y)\;</math>. First, calculate an initial approximation to <math>p(\theta\mid y)\;</math> ignoring the <math>\eta</math> dependence completely; then calculate an approximation to <math>p(\eta\mid y)\;</math> based upon the initial approximate distribution of <math>p(\theta\mid y)\;</math>; then use this <math>p(\eta\mid y)\;</math> to update the approximation for <math>p(\theta\mid y)\;</math>; then update <math>p(\eta\mid y)\;</math>; and so on. When the true distribution <math>p(\eta\mid y)\;</math> is sharply peaked, the integral determining <math>p(\theta\mid y)\;</math> may be not much changed by replacing the probability distribution over <math>\eta\;</math> with a point estimate <math>\eta^{*}\;</math> representing the distribution's peak (or, alternatively, its mean), :<math> p(\theta\mid y) \simeq \frac{p(y \mid \theta) \; p(\theta \mid \eta^{*})}{p(y \mid \eta^{*})}\,. </math> With this approximation, the above iterative scheme becomes the [[EM algorithm]]. The term "Empirical Bayes" can cover a wide variety of methods, but most can be regarded as an early truncation of either the above scheme or something quite like it. Point estimates, rather than the whole distribution, are typically used for the parameter(s) <math>\eta\;</math>. The estimates for <math>\eta^{*}\;</math> are typically made from the first approximation to <math>p(\theta\mid y)\;</math> without subsequent refinement. These estimates for <math>\eta^{*}\;</math> are usually made without considering an appropriate prior distribution for <math>\eta</math>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)