Editing Markov chain (section)

====Convergence speed to the stationary distribution====
As stated earlier, from the equation <math>\boldsymbol{\pi} = \boldsymbol{\pi} \mathbf{P},</math> (if exists) the stationary (or steady state) distribution '''{{pi}}''' is a left eigenvector of row [[stochastic matrix]] '''P'''. Then assuming that '''P''' is diagonalizable or equivalently that '''P''' has ''n'' linearly independent eigenvectors, speed of convergence is elaborated as follows. (For non-diagonalizable, that is, [[defective matrix|defective matrices]], one may start with the [[Jordan normal form]] of '''P''' and proceed with a bit more involved set of arguments in a similar way.<ref>{{cite journal |last1=Schmitt |first1=Florian |last2=Rothlauf |first2=Franz |title=On the Importance of the Second Largest Eigenvalue on the Convergence Rate of Genetic Algorithms |journal=Proceedings of the 14th Symposium on Reliable Distributed Systems |date=2001 |citeseerx=10.1.1.28.6191 }}</ref>)

Let '''U''' be the matrix of eigenvectors (each normalized to having an L2 norm equal to 1) where each column is a left eigenvector of '''P''' and let '''Σ''' be the diagonal matrix of left eigenvalues of '''P''', that is, '''Σ''' = diag(''λ''<sub>1</sub>,''λ''<sub>2</sub>,''λ''<sub>3</sub>,...,''λ''<sub>''n''</sub>). Then by [[eigendecomposition]]
:<math> \mathbf{P} = \mathbf{U\Sigma U}^{-1} .</math>

Let the eigenvalues be enumerated such that:
:<math> 1 = |\lambda_1 |> |\lambda_2 | \geq |\lambda_3 | \geq \cdots \geq |\lambda_n|.</math>

Since '''P''' is a row stochastic matrix, its largest left eigenvalue is 1. If there is a unique stationary distribution, then the largest eigenvalue and the corresponding eigenvector is unique too (because there is no other '''{{pi}}''' which solves the stationary distribution equation above). Let '''u'''<sub>''i''</sub> be the ''i''-th column of '''U''' matrix, that is, '''u'''<sub>''i''</sub> is the left eigenvector of '''P''' corresponding to λ<sub>''i''</sub>. Also let '''x''' be a length ''n'' row vector that represents a valid probability distribution; since the eigenvectors '''u'''<sub>''i''</sub> span <math>\R^n,</math> we can write
:<math> \mathbf{x}^\mathsf{T} = \sum_{i=1}^n a_i \mathbf{u}_i, \qquad a_i \in \R.</math>

If we multiply '''x''' with '''P''' from right and continue this operation with the results, in the end we get the stationary distribution '''{{pi}}'''. In other words, '''{{pi}}''' = '''a'''<sub>1</sub> '''u'''<sub>1</sub> ← '''xPP'''...'''P''' = '''xP'''<sup>''k''</sup> as ''k'' → ∞. That means
:<math>\begin{align}
\boldsymbol{\pi}^{(k)} &= \mathbf{x} \left (\mathbf{U\Sigma U}^{-1} \right ) \left (\mathbf{U\Sigma U}^{-1} \right )\cdots \left (\mathbf{U\Sigma U}^{-1} \right ) \\
&= \mathbf{xU\Sigma}^k \mathbf{U}^{-1} \\
&= \left (a_1\mathbf{u}_1^\mathsf{T} + a_2\mathbf{u}_2^\mathsf{T} + \cdots + a_n\mathbf{u}_n^\mathsf{T} \right )\mathbf{U\Sigma}^k\mathbf{U}^{-1} \\
&= a_1\lambda_1^k\mathbf{u}_1^\mathsf{T} + a_2\lambda_2^k\mathbf{u}_2^\mathsf{T} + \cdots + a_n\lambda_n^k\mathbf{u}_n^\mathsf{T} && u_i \bot u_j \text{ for } i\neq j \\
& = \lambda_1^k\left\{a_1\mathbf{u}_1^\mathsf{T} + a_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\mathbf{u}_2^\mathsf{T} + a_3\left(\frac{\lambda_3}{\lambda_1}\right)^k\mathbf{u}_3^\mathsf{T} + \cdots + a_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\mathbf{u}_n^\mathsf{T}\right\}
\end{align}</math>

Since '''{{pi}}''' is parallel to '''u'''<sub>1</sub>(normalized by L2 norm) and '''{{pi}}'''<sup>(''k'')</sup> is a probability vector, '''{{pi}}'''<sup>(''k'')</sup> approaches to '''a'''<sub>1</sub> '''u'''<sub>1</sub> = '''{{pi}}''' as ''k'' → ∞ with a speed in the order of ''λ''<sub>2</sub>/''λ''<sub>1</sub> exponentially. This follows because <math> |\lambda_2| \geq \cdots \geq |\lambda_n|,</math> hence ''λ''<sub>2</sub>/''λ''<sub>1</sub> is the dominant term. The smaller the ratio is, the faster the convergence is.<ref>{{Cite journal | volume = 37 | issue = 3| pages = 387–405| last = Rosenthal| first = Jeffrey S.| title = Convergence Rates for Markov Chains| journal = SIAM Review| accessdate = 2021-05-31| date = 1995| doi = 10.1137/1037083| url = https://www.jstor.org/stable/2132659| jstor = 2132659}}</ref> Random noise in the state distribution '''{{pi}}''' can also speed up this convergence to the stationary distribution.<ref>{{cite journal|last=Franzke|first=Brandon|author2=Kosko, Bart|date=1 October 2011|title=Noise can speed convergence in Markov chains|journal=Physical Review E|volume=84|issue=4|pages=041112|bibcode=2011PhRvE..84d1112F|doi=10.1103/PhysRevE.84.041112|pmid=22181092}}</ref>