Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Gibbs sampling
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Gibbs sampler in Bayesian inference and its relation to information theory == Let <math>y</math> denote observations generated from the sampling distribution <math>f(y|\theta)</math> and <math>\pi(\theta)</math> be a prior supported on the parameter space <math>\Theta</math>. Then one of the central goals of the [[Bayesian statistics]] is to approximate the posterior density :<math>\pi(\theta|y) = \frac{f(y|\theta) \cdot \pi(\theta)}{m(y)}</math> where the marginal likelihood <math> m(y) = \int_{\Theta} f(y|\theta) \cdot \pi(\theta) d\theta </math> is assumed to be finite for all <math>y</math>. To explain the Gibbs sampler, we additionally assume that the parameter space <math>\Theta</math> is decomposed as :<math>\Theta = \prod_{i=1}^{K}\Theta_{i} = \Theta_1 \times \cdots \Theta_i \times \cdots \times \Theta_K, \quad\quad (K>1)</math>, where <math>\times</math> represents the [[Cartesian product]]. Each component parameter space <math>\Theta_i</math> can be a set of scalar components, subvectors, or matrices. Define a set <math>\Theta_{-i}</math> that complements the <math>\Theta_i</math>. Essential ingredients of the Gibbs sampler is the <math>i</math>-th full conditional posterior distribution for each <math>i=1,\cdots, K</math> :<math>\pi(\theta_i|\theta_{-i},y)=\pi(\theta_i|\theta_1, \cdots, \theta_{i-1},\theta_{i+1},\cdots, \theta_{K},y)</math>. [[File:Gibbs sampler picture.jpg|thumb|500px|A pictorial description of the Gibbs sampling algorithm <ref name="Lee2008">{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|volume=51 |issue=6 |pages=1549β1568 |doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cid=220935477 }}</ref>]] [[File:Gibbs sampling info eq.jpg|thumb|500px|Schematic description of the information equality associated with the Gibbs sampler at the i-th step within a cycle <ref name="Lee2008" />]] The following algorithm details a generic Gibbs sampler: <math>\text{Initialize: pick arbitrary starting value}\,\, \theta^{(1)} = (\theta_1^{(1)},\theta_2^{(1)},\cdots,\theta_i^{(1)},\theta_{i+1}^{(1)},\cdots,\theta_K^{(1)}) </math> <math>\text{Iterate a Cycle:}\,</math> <math> \quad\quad \text{Step 1. draw}\,\, \theta_1^{(s+1)}\sim \pi(\theta_1|\theta_2^{(s)},\theta_3^{(s)},\cdots,\theta_K^{(s)},y )</math> <math> \quad\quad \text{Step 2. draw}\,\, \theta_2^{(s+1)}\sim \pi(\theta_2|\theta_1^{(s+1)},\theta_3^{(s)},\cdots,\theta_K^{(s)},y )</math> <math> \quad\quad\quad \vdots</math> <math> \quad\quad \text{Step i. draw}\,\, \theta_i^{(s+1)}\sim \pi(\theta_i|\theta_1^{(s+1)},\theta_2^{(s+1)},\cdots, \theta_{i-1}^{(s+1)},\theta_{i+1}^{(s)} ,\cdots, \theta_K^{(s)},y )</math> <math> \quad \quad \text{Step i+1. draw}\,\, \theta_{i+1}^{(s+1)}\sim \pi(\theta_{i+1}|\theta_1^{(s+1)},\theta_2^{(s+1)},\cdots, \theta_{i}^{(s+1)},\theta_{i+2}^{(s)} ,\cdots, \theta_K^{(s)},y )</math> <math> \quad\quad\quad \vdots</math> <math> \quad\quad \text{Step K. draw}\,\, \theta_K^{(s+1)}\sim \pi(\theta_K|\theta_1^{(s+1)},\theta_2^{(s+1)},\cdots,\theta_{K-1}^{(s+1)},y )</math> <math>\text{end Iterate}</math> Note that Gibbs sampler is operated by the iterative Monte Carlo scheme within a cycle. The <math>S</math> number of samples <math>\{\theta^{(s)} \}_{s=1}^{S}</math> drawn by the above algorithm formulates Markov Chains with the invariant distribution to be the target density <math>\pi(\theta|y)</math>. Now, for each <math>i=1,\cdots,K</math>, define the following information theoretic quantities: <math>I(\theta_i ; \theta_{-i}) = \text{KL}(\pi(\theta|y)||\pi(\theta_i|y) \cdot \pi(\theta_{-i}|y) ) =\int_{\Theta} \pi(\theta|y) \log \bigg(\frac{\pi(\theta|y)}{\pi(\theta_i|y) \cdot \pi(\theta_{-i}|y)}\bigg) d\theta, </math> <math>H(\theta_{-i}) = -\int_{\Theta_{-i}} \pi(\theta_{-i}|y) \log \pi(\theta_{-i}|y) d\theta_{-i}, </math> <math>H(\theta_{-i}|\theta_{i}) = -\int_{\Theta} \pi(\theta|y) \log \pi(\theta_{-i}|\theta_{i},y) d\theta, </math> namely, posterior mutual information, posterior differential entropy, and posterior conditional differential entropy, respectively. We can similarly define information theoretic quantities <math>I(\theta_{-i} ; \theta_{i})</math>, <math>H(\theta_{i})</math>, and <math>H(\theta_{i}|\theta_{-i})</math> by interchanging the <math>i</math> and <math>-i</math> in the defined quantities. Then, the following <math>K</math> equations hold.<ref name="Lee2008" /> <math>I(\theta_i ; \theta_{-i}) = H(\theta_{-i}) - H(\theta_{-i}|\theta_{i}) = H(\theta_{i}) - H(\theta_{i}|\theta_{-i}) = I(\theta_{-i} ; \theta_{i}), \quad (i=1,\cdots,K) </math>. The mutual information <math>I(\theta_i ; \theta_{-i}) </math> quantifies the reduction in uncertainty of random quantity <math>\theta_{i}</math> once we know <math>\theta_{-i}</math>, a posteriori. It vanishes if and only if <math>\theta_{i}</math> and <math>\theta_{-i}</math> are marginally independent, a posterior. The mutual information <math>I(\theta_i ; \theta_{-i})</math> can be interpreted as the quantity that is transmitted from the <math>i</math>-th step to the <math>i+1</math>-th step within a single cycle of the Gibbs sampler.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)