Editing Markov chain Monte Carlo (section)

== Mathematical Setting ==
Suppose ''(X<sub>n</sub>)'' is a [[Markov Chain]] in the general state space <math>\mathcal{X}</math> with specific properties. We are interested in the limiting behavior of the partial sums:

:<math>S_n(h) = \dfrac{1}{n} \sum_{i=1}^{n} h(X_i)</math>

as ''n'' goes to infinity. Particularly, we hope to establish the [[Law of Large Numbers]] and the [[Central Limit Theorem]] for MCMC. In the following, we state some definitions and theorems necessary for the important convergence results. In short, we need the existence of invariant measure and Harris recurrent to establish the Law of Large Numbers of MCMC (Ergodic Theorem). And we need aperiodicity, irreducibility and extra conditions such as reversibility to ensure the Central Limit Theorem holds in MCMC.<ref name="RC2004">Robert and Casella (2004), pp. 205–246</ref>

=== Irreducibility and Aperiodicity ===
Recall that in the discrete setting, a [[Markov chain]] is said to be ''irreducible'' if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, '''&phi;-irreducibility''' generalizes [[irreducibility]] by using a reference measure &phi; on the measurable space <math>(\mathcal{X},\mathcal{B}(\mathcal{X}))</math>.

;Definition (φ-irreducibility)
Given a measure <math>\varphi</math> defined on <math>(\mathcal{X},\mathcal{B}(\mathcal{X}))</math>, the Markov chain <math>(X_n)</math> with transition kernel <math>K(x, y)</math> is '''φ-irreducible''' if, for every <math>A \in \mathcal{B}(\mathcal{X})</math> with <math>\varphi(A) > 0</math>, there exists <math>n</math> such that <math>K^n(x, A) > 0</math> for all <math>x \in \mathcal{X}</math> (Equivalently, <math>P_x(\tau_A < \infty) > 0</math>, here <math>\tau_A = \inf\{n \geq 1 ; X_n \in A\}</math> is the first <math>n</math> for which the chain enters the set <math>A</math>). 

This is a more general definition for [[irreducibility]] of a [[Markov chain]] in non-discrete state space. In the discrete case, an irreducible Markov chain is said to be ''aperiodic'' if it has period 1. Formally, the period of a state <math>\omega \in \mathcal{X}</math> is defined as:

:<math>
d(\omega) := \mathrm{gcd}\{m \geq 1 \,;\, K^m(\omega, \omega) > 0\}
</math>

For the general (non-discrete) case, we define aperiodicity in terms of small sets:

;Definition (Cycle length and small sets)
A '''&phi;-irreducible''' Markov chain <math>(X_n)</math> has a ''cycle of length d'' if there exists a small set <math>C</math>, an associated integer <math>M</math>, and a probability distribution <math>\nu_M</math> such that ''d'' is the [[greatest common divisor]] of:

:<math>
\{ m \geq 1 \,;\, \exists\, \delta_m > 0 \text{ such that } C \text{ is small for } \nu_m \geq \delta_m \nu_M \}.
</math>

A set <math>C</math> is called '''small''' if there exists <math>m \in \mathbb{N}^*</math> and a nonzero measure <math>\nu_m</math> such that:

:<math>
K^m(x, A) \geq \nu_m(A), \quad \forall x \in C,\, \forall A \in \mathcal{B}(\mathcal{X}).
</math>

=== Harris recurrent ===

;Definition (Harris recurrence)
A set <math>A</math> is '''Harris recurrent''' if <math>P_x(\eta_A = \infty) = 1</math> for all <math>x \in A</math>, where <math>\eta_A = \sum_{n=1}^\infty \mathbb{I}_A(X_n)</math> is the number of visits of the chain <math>(X_n)</math> to the set <math>A</math>.

The chain <math>(X_n)</math> is said to be '''Harris recurrent''' if there exists a measure <math>\psi</math> such that the chain is <math>\psi</math>-irreducible and every measurable set <math>A</math> with <math>\psi(A) > 0</math> is Harris recurrent.

A useful criterion for verifying Harris recurrence is the following:

;Proposition
If for every <math>A \in \mathcal{B}(\mathcal{X})</math>, we have <math>P_x(\tau_A < \infty) = 1</math> for every <math>x \in A</math>, then <math>P_x(\eta_A = \infty) = 1</math> for all <math>x \in \mathcal{X}</math>, and the chain <math>(X_n)</math> is Harris recurrent. 

This definition is only needed when the state space <math>\mathcal{X}</math> is uncountable. In the countable case, recurrence corresponds to <math>\mathbb{E}_x[\eta_x] = \infty</math>, which is equivalent to <math>P_x(\tau_x < \infty) = 1</math> for all <math>x \in \mathcal{X}</math>.

;Definition (Invariant measure)
A <math>\sigma</math>-finite measure <math>\pi</math> is said to be '''invariant''' for the transition kernel <math>K(\cdot, \cdot)</math> (and the associated chain) if:

:<math>\pi(B) = \int_{\mathcal{X}} K(x, B) \, \pi(dx), \qquad \forall B \in \mathcal{B}(\mathcal{X}).</math>
When there exists an ''invariant probability measure'' for a '''&psi;-irreducible''' (hence recurrent) chain, the chain is said to be '''positive recurrent'''.  
Recurrent chains that do not allow for a finite invariant measure are called '''null recurrent'''.

In applications of Markov Chain Monte Carlo (MCMC), a very useful criterion for Harris recurrence involves the use of bounded harmonic functions.

;Definition (Harmonic function)
A measurable function <math>h</math> is said to be '''harmonic''' for the chain <math>(X_n)</math> if:

:<math>\mathbb{E}[h(X_{n+1}) \mid x_n] = h(x_n)</math>

These functions are ''invariant'' under the transition kernel in the functional sense, and they help characterize Harris recurrence.

;Proposition
''For a positive Markov chain, if the only bounded harmonic functions are the constant functions, then the chain is Harris recurrent.''

=== Law of Large Numbers for MCMC ===

;Theorem (Ergodic Theorem for MCMC)
If <math>(X_n)</math> has a <math>\sigma</math>-finite invariant measure <math>\pi</math>, then the following two statements are equivalent:
# The Markov chain <math>(X_n)</math> is '''Harris recurrent'''.
# If <math>f, g \in L^1(\pi)</math> with <math>\int g(x) \, d\pi(x) \ne 0</math>, then<math>
\lim_{n \to \infty} \frac{S_n(f)}{S_n(g)} = \frac{\int f(x) \, d\pi(x)}{\int g(x) \, d\pi(x)}.</math>

This theorem provides a fundamental justification for the use of Markov Chain Monte Carlo (MCMC) methods, and it serves as the counterpart of the [[Law of Large Numbers]] (LLN) in classical Monte Carlo. 

An important aspect of this result is that <math>\pi</math> does not need to be a probability measure. Therefore, there can be some type of strong stability even if the chain is null recurrent. Moreover, the Markov chain can be started from arbitrary state.

If <math>\pi</math> is a probability measure, we can let <math>g \equiv 1</math> and get
:<math>
\lim_{n \to \infty} S_n(f) = \int f(x) \,  d\pi(x).
</math>
This is the Ergodic Theorem that we are more familiar with.

=== Central Limit Theorem for MCMC ===

There are several conditions under which the [[Central Limit Theorem]] (CLT) holds for Markov chain Monte Carlo (MCMC) methods. One of the most commonly used is the condition of '''reversibility'''.

;Definition (Reversibility)
A stationary Markov chain <math>(X_n)</math> is said to be '''reversible''' if the distribution of <math>X_{n+1}</math> given <math>X_{n+2} = x</math> is the same as the distribution of <math>X_{n+1}</math> given <math>X_n = x</math>.

This is equivalent to the ''detailed balance condition'', which is defined as follows:

;Definition (Detailed balance)
A Markov chain with transition kernel <math>K</math> satisfies the '''detailed balance condition''' if there exists a function <math>f</math> such that:
:<math>K(y, x) f(y) = K(x, y) f(x)</math>  
for every pair <math>(x, y)</math> in the state space.

;Theorem (CLT under reversibility)
If <math>(X_n)</math> is aperiodic, irreducible, and reversible with invariant distribution <math>\pi</math>, then:

:<math>
\frac{1}{\sqrt{N}} \left( \sum_{n=1}^N \left( h(X_n) - \mathbb{E}^\pi[h] \right) \right)
\overset{\mathcal{L}}{\longrightarrow} \mathcal{N}(0, \gamma_h^2)
</math>

where

:<math>
0 < \gamma_h^2 = \mathbb{E}_\pi \left[ \bar{h}^2(X_0) \right] + 2 \sum_{k=1}^{\infty} \mathbb{E}_\pi \left[ \bar{h}(X_0) \bar{h}(X_k) \right] < +\infty
</math>

and 
:<math>
\bar{h}(\cdot) = h(\cdot) - E[h(\cdot)]
</math>.

Even though reversibility is a restrictive assumption in theory, it is often easily satisfied in practical MCMC algorithms by introducing auxiliary variables or using symmetric proposal mechanisms. There are many other conditions that can be used to establish CLT for MCMC such as geometirc ergodicity  and the discrete state space.