Editing Markov chain Monte Carlo (section)

=== Heidelberger-Welch Diagnostics ===

The Heidelberger-Welch diagnostic is grounded in [[Spectral theory|spectral analysis]] and [[Brownian motion|Brownian motion theory]], and is particularly useful in the early stages of simulation to determine appropriate burn-in and stopping time.<ref>{{Cite journal |last1=Heidelberger |first1=Philip |last2=Welch |first2=Peter D. |date=1981-04-01 |title=A spectral method for confidence interval generation and run length control in simulations |url=https://dl.acm.org/doi/10.1145/358598.358630 |journal=Commun. ACM |volume=24 |issue=4 |pages=233–245 |doi=10.1145/358598.358630 |issn=0001-0782}}</ref><ref>{{Cite journal |last1=Heidelberger |first1=Philip |last2=Welch |first2=Peter D. |date=1983-12-01 |title=Simulation Run Length Control in the Presence of an Initial Transient |url=https://pubsonline.informs.org/doi/10.1287/opre.31.6.1109 |journal=Operations Research |volume=31 |issue=6 |pages=1109–1144 |doi=10.1287/opre.31.6.1109 |issn=0030-364X}}</ref> The diagnostic consists of two components, a '''stationarity test''' that assesses whether the Markov chain has reached a steady-state, and a '''half-width test''' that determines whether the estimated expectation is within a user-specified precision.

==== Stationary Test ====

Let <math>\{X_t\}_{t=1}^n</math> be the output of an MCMC simulation for a scalar function <math>g(X_t)</math>, and <math>g_1,g_2,\dots,g_n</math> the evaluations of the function <math>g</math> over the chain. Define the standardized cumulative sum process:

:<math>
B_n(t) = \dfrac{\sum_{i=1}^{\text{round}(nt)} g_i - \text{round}(nt) \bar{g}_n}{\sqrt{n\hat{S}(0)}},\;\;\; t\in[0,1]
</math> 
where <math>\bar{g}_n = \frac{1}{n}\sum_{i=1}^n g_i</math> is the sample mean and <math>\hat{S}(0)</math> is an estimate of the spectral density at frequency zero.

Under the null hypothesis of convergence, the process <math>B_n(t)</math> converges in distribution to a [[Brownian bridge]]. The following [[Cramér–von Mises criterion|Cramér-von Mises statistic]] is used to test for stationarity:

:<math>
C_n = \int_0^1 B_n(t)^2 dt.
</math>

This statistic is compared against known critical values from the Brownian bridge distribution. If the null hypothesis is rejected, the first 10% of the samples are discarded and the test can be repeated on the remaining chain until either stationarity is accepted or 50% of the chain is discarded.

==== Half-Width Test (Precision Check) ====

Once stationarity is accepted, the second part of the diagnostic checks whether the Monte Carlo estimator is accurate enough for practical use. Assuming the central limit theorem holds, the confidence interval for the mean <math>\mathbb{E}_\pi[g(X)]</math> is given by
:<math>
\bar{g}_n \pm t_{\alpha/2,\nu} \cdot \dfrac{\hat{\sigma}_n}{\sqrt{n}}
</math>
where <math>\hat{\sigma}^2</math> is an estimate of the variance of <math>g(X)</math>, <math>t_{\alpha/2,\nu}</math> is the [[Student's t-test|Student's <math>t</math>]] critical value at confidence level <math>1 - \alpha</math> and degrees of freedom <math>\nu</math>, <math>n</math> is the number of samples used.

The '''half-width''' of this interval is defined as
:<math>
t_{\alpha/2,\nu} \cdot \dfrac{\hat{\sigma}_n}{\sqrt{n}}
</math>
If the half-width is smaller than a user-defined tolerance (e.g., 0.05), the chain is considered long enough to estimate the expectation reliably. Otherwise, the simulation should be extended.