Editing Metropolis–Hastings algorithm (section)

==Formal derivation==
The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math>.<ref name=Roberts_Casella/>

A Markov process is uniquely defined by its transition probabilities <math>P(x' \mid x)</math>, the probability of transitioning from any given state <math>x</math> to any other given state <math>x'</math>. It has a unique stationary distribution <math>\pi(x)</math> when the following two conditions are met:<ref name=Roberts_Casella/>
# ''Existence of stationary distribution'': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[detailed balance]], which requires that each transition <math>x \to x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x' \mid x) = \pi(x') P(x \mid x')</math>.
# ''Uniqueness of stationary distribution'': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite.

The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) that fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of [[detailed balance]]:

: <math>P(x' \mid x) P(x) = P(x \mid x') P(x'),</math>

which is re-written as

: <math>\frac{P(x' \mid x)}{P(x \mid x')} = \frac{P(x')}{P(x)}.</math>

The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The proposal distribution <math>g(x' \mid x)</math> is the conditional probability of proposing a state <math>x'</math> given <math>x</math>, and the acceptance distribution <math>A(x', x)</math> is the probability to accept the proposed state <math>x'</math>. The transition probability can be written as the product of them:

: <math>P(x'\mid x) = g(x' \mid x) A(x', x).</math>

Inserting this relation in the previous equation, we have

: <math>\frac{A(x', x)}{A(x, x')} = \frac{P(x')}{P(x)}\frac{g(x \mid x')}{g(x' \mid x)}.</math>

The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice:

: <math>A(x', x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x \mid x')}{g(x' \mid x)}\right).</math>

For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied.

The Metropolis–Hastings algorithm can thus be written as follows:
# Initialise 
## Pick an initial state <math>x_0</math>.
## Set <math>t = 0</math>.
# Iterate
##''Generate'' a random candidate state <math>x'</math> according to <math>g(x' \mid x_t)</math>.
## ''Calculate'' the acceptance probability <math>A(x', x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t \mid x')}{g(x' \mid x_t)}\right)</math>.
## ''Accept or reject'':
### generate a uniform random number <math>u \in [0, 1]</math>;
### if <math>u \le A(x' , x_t)</math>, then ''accept'' the new state and set <math>x_{t+1} = x'</math>;
### if <math>u >   A(x' , x_t)</math>, then ''reject'' the new state, and copy the old state forward <math>x_{t+1} = x_{t}</math>.
## ''Increment'': set <math>t = t + 1</math>.

Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?" ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/>

It is important to notice that it is not clear, in a general problem, which distribution <math>g(x' \mid x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand.