Editing Metropolis–Hastings algorithm (section)

==Step-by-step instructions==
[[File:3dRosenbrock.png|thumb|300px|Three [[Markov chain]]s running on the 3D [[Rosenbrock function]] using the Metropolis–Hastings algorithm. The chains converge and mix in the region where the function is high. The approximate position of the maximum has been illuminated. The red points are the ones that remain after the burn-in process. The earlier ones have been discarded.]]

Suppose that the most recent value sampled is <math>x_t</math>. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state <math>x'</math> with probability density <math>g(x' \mid x_t)</math> and calculate a value

: <math>a = a_1 a_2,</math>

where

: <math>a_1 = \frac{P(x')}{P(x_t)}</math>

is the probability (e.g., Bayesian posterior) ratio between the proposed sample <math>x'</math> and the previous sample <math>x_t</math>, and

: <math>a_2 = \frac{g(x_t \mid x')}{g(x' \mid x_t)}</math>

is the ratio of the proposal density in two directions (from <math>x_t</math> to <math>x'</math> and conversely).
This is equal to 1 if the proposal density is symmetric.
Then the new state <math>x_{t+1}</math> is chosen according to the following rules.

: If <math>a \geq 1{:}</math>
:: <math>x_{t+1} = x',</math>
: else:
:: <math>x_{t+1} =
 \begin{cases}
  x' & \text{with probability } a, \\
  x_t & \text{with probability } 1-a.
 \end{cases}
</math>

The Markov chain is started from an arbitrary initial value <math>x_0</math>, and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as ''burn-in''. The remaining set of accepted values of <math>x</math> represent a [[Sample (statistics)|sample]] from the distribution <math>P(x)</math>.

The algorithm works best if the proposal density matches the shape of the target distribution <math>P(x)</math>, from which direct sampling is difficult, that is <math>g(x' \mid x_t) \approx P(x')</math>.
If a Gaussian proposal density <math>g</math> is used, the variance parameter <math>\sigma^2</math> has to be tuned during the burn-in period.
This is usually done by calculating the ''acceptance rate'', which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples.
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one-dimensional Gaussian distribution is about 50%, decreasing to about 23% for an <math>N</math>-dimensional Gaussian target distribution.<ref name=Roberts/> These guidelines can work well when sampling from sufficiently regular Bayesian posteriors as they often follow a multivariate normal distribution as can be established using the [[Bernstein–von Mises theorem]].<ref>{{Cite journal |last1=Schmon |first1=Sebastian M. |last2=Gagnon |first2=Philippe |date=2022-04-15 |title=Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics |journal=Statistics and Computing |language=en |volume=32 |issue=2 |pages=28 |doi=10.1007/s11222-022-10080-8 |issn=0960-3174 |pmc=8924149 |pmid=35310543}}</ref>

If <math>\sigma^2</math> is too small, the chain will ''mix slowly'' (i.e., the acceptance rate will be high, but successive samples will move around the space slowly, and the chain will converge only slowly to <math>P(x)</math>). On the other hand,
if <math>\sigma^2</math> is too large, the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density, so <math>a_1</math> will be very small, and again the chain will converge very slowly. One typically tunes the proposal distribution so that the algorithms accepts on the order of 30% of all samples – in line with the theoretical estimates mentioned in the previous paragraph.