Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Metropolis–Hastings algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Formal derivation== The purpose of the Metropolis–Hastings algorithm is to generate a collection of states according to a desired distribution <math>P(x)</math>. To accomplish this, the algorithm uses a [[Markov process]], which asymptotically reaches a unique [[Markov chain#Steady-state analysis and limiting distributions|stationary distribution]] <math>\pi(x)</math> such that <math>\pi(x) = P(x)</math>.<ref name=Roberts_Casella/> A Markov process is uniquely defined by its transition probabilities <math>P(x' \mid x)</math>, the probability of transitioning from any given state <math>x</math> to any other given state <math>x'</math>. It has a unique stationary distribution <math>\pi(x)</math> when the following two conditions are met:<ref name=Roberts_Casella/> # ''Existence of stationary distribution'': there must exist a stationary distribution <math>\pi(x)</math>. A sufficient but not necessary condition is [[detailed balance]], which requires that each transition <math>x \to x'</math> is reversible: for every pair of states <math>x, x'</math>, the probability of being in state <math>x</math> and transitioning to state <math>x'</math> must be equal to the probability of being in state <math>x'</math> and transitioning to state <math>x</math>, <math>\pi(x) P(x' \mid x) = \pi(x') P(x \mid x')</math>. # ''Uniqueness of stationary distribution'': the stationary distribution <math>\pi(x)</math> must be unique. This is guaranteed by [[Markov Chain#Ergodicity|ergodicity]] of the Markov process, which requires that every state must (1) be aperiodic—the system does not return to the same state at fixed intervals; and (2) be positive recurrent—the expected number of steps for returning to the same state is finite. The Metropolis–Hastings algorithm involves designing a Markov process (by constructing transition probabilities) that fulfills the two above conditions, such that its stationary distribution <math>\pi(x)</math> is chosen to be <math>P(x)</math>. The derivation of the algorithm starts with the condition of [[detailed balance]]: : <math>P(x' \mid x) P(x) = P(x \mid x') P(x'),</math> which is re-written as : <math>\frac{P(x' \mid x)}{P(x \mid x')} = \frac{P(x')}{P(x)}.</math> The approach is to separate the transition in two sub-steps; the proposal and the acceptance-rejection. The proposal distribution <math>g(x' \mid x)</math> is the conditional probability of proposing a state <math>x'</math> given <math>x</math>, and the acceptance distribution <math>A(x', x)</math> is the probability to accept the proposed state <math>x'</math>. The transition probability can be written as the product of them: : <math>P(x'\mid x) = g(x' \mid x) A(x', x).</math> Inserting this relation in the previous equation, we have : <math>\frac{A(x', x)}{A(x, x')} = \frac{P(x')}{P(x)}\frac{g(x \mid x')}{g(x' \mid x)}.</math> The next step in the derivation is to choose an acceptance ratio that fulfills the condition above. One common choice is the Metropolis choice: : <math>A(x', x) = \min\left(1, \frac{P(x')}{P(x)} \frac{g(x \mid x')}{g(x' \mid x)}\right).</math> For this Metropolis acceptance ratio <math>A</math>, either <math>A(x', x) = 1</math> or <math>A(x, x') = 1</math> and, either way, the condition is satisfied. The Metropolis–Hastings algorithm can thus be written as follows: # Initialise ## Pick an initial state <math>x_0</math>. ## Set <math>t = 0</math>. # Iterate ##''Generate'' a random candidate state <math>x'</math> according to <math>g(x' \mid x_t)</math>. ## ''Calculate'' the acceptance probability <math>A(x', x_t) = \min\left(1, \frac{P(x')}{P(x_t)} \frac{g(x_t \mid x')}{g(x' \mid x_t)}\right)</math>. ## ''Accept or reject'': ### generate a uniform random number <math>u \in [0, 1]</math>; ### if <math>u \le A(x' , x_t)</math>, then ''accept'' the new state and set <math>x_{t+1} = x'</math>; ### if <math>u > A(x' , x_t)</math>, then ''reject'' the new state, and copy the old state forward <math>x_{t+1} = x_{t}</math>. ## ''Increment'': set <math>t = t + 1</math>. Provided that specified conditions are met, the empirical distribution of saved states <math>x_0, \ldots, x_T</math> will approach <math>P(x)</math>. The number of iterations (<math>T</math>) required to effectively estimate <math>P(x)</math> depends on the number of factors, including the relationship between <math>P(x)</math> and the proposal distribution and the desired accuracy of estimation.<ref>Raftery, Adrian E., and Steven Lewis. "How Many Iterations in the Gibbs Sampler?" ''In Bayesian Statistics 4''. 1992.</ref> For distribution on discrete state spaces, it has to be of the order of the [[autocorrelation]] time of the Markov process.<ref name=Newman_Barkema/> It is important to notice that it is not clear, in a general problem, which distribution <math>g(x' \mid x)</math> one should use or the number of iterations necessary for proper estimation; both are free parameters of the method, which must be adjusted to the particular problem in hand.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)