Editing Markov chain (section)

==Properties==
Two states are said to ''communicate'' with each other if both are reachable from one another by a sequence of transitions that have positive probability. This is an equivalence relation which yields a set of communicating classes. A class is ''closed'' if the probability of leaving the class is zero. A Markov chain is ''irreducible'' if there is one communicating class, the state space.

A state {{Math|''i''}} has period {{Math|''k''}} if {{Math|''k''}} is the [[greatest common divisor]] of the number of transitions by which {{Math|''i''}} can be reached, starting from {{Math|''i''}}. That is:
:<math> k = \gcd\{ n > 0: \Pr(X_n = i \mid X_0 = i) > 0\}</math>
The state is ''periodic'' if <math>k > 1</math>; otherwise <math>k = 1</math> and the state is ''aperiodic''.

A state ''i'' is said to be ''transient'' if, starting from ''i'', there is a non-zero probability that the chain will never return to ''i''. It is called ''recurrent'' (or ''persistent'') otherwise.<ref name="Heyman">{{cite book |last1=Heyman |first1=Daniel P. |last2=Sobel |first2=Mathew J. |title=Stochastic Models in Operations Research, Volume 1 |date=1982 |publisher=McGraw-Hill |location=New York |isbn=0-07-028631-0 |page=230}}</ref> For a recurrent state ''i'', the mean ''hitting time'' is defined as:
:<math> M_i = E[T_i]=\sum_{n=1}^\infty n\cdot f_{ii}^{(n)}.</math>

State ''i'' is ''positive recurrent'' if <math>M_i</math> is finite and ''null recurrent'' otherwise. Periodicity, transience, recurrence and positive and null recurrence are class properties — that is, if one state has the property then all states in its communicating class have the property.<ref>{{Cite web |last=Peres |first=Yuval |author-link=Yuval Peres |title=Show that positive recurrence is a class property |url=https://math.stackexchange.com/questions/4572155/show-that-positive-recurrence-is-a-class-property |access-date=2024-02-01 |website=Mathematics Stack Exchange |language=en}}</ref>

A state ''i'' is called ''absorbing'' if there are no outgoing transitions from the state.

=== Irreducibility ===
Since periodicity is a class property, if a Markov chain is irreducible, then all its states have the same period. In particular, if one state is aperiodic, then the whole Markov chain is aperiodic.<ref>{{Cite web |last=Lalley |first=Steve |author-link=Steven Lalley |year=2016 |title=Markov Chains: Basic Theory |url=http://galton.uchicago.edu/~lalley/Courses/312/MarkovChains.pdf |access-date=22 June 2024}}</ref>

If a finite Markov chain is irreducible, then all states are positive recurrent, and it has a unique stationary distribution given by <math>\pi_i = 1/E[T_i]</math>.

===Ergodicity===
A state ''i'' is said to be ''ergodic'' if it is aperiodic and positive recurrent. In other words, a state ''i'' is ergodic if it is recurrent, has a period of 1, and has finite mean recurrence time.

If all states in an irreducible Markov chain are ergodic, then the chain is said to be ergodic. Equivalently, there exists some integer <math>k</math> such that all entries of <math>M^k</math> are positive.

It can be shown that a finite state irreducible Markov chain is ergodic if it has an aperiodic state. More generally, a Markov chain is ergodic if there is a number ''N'' such that any state can be reached from any other state in any number of steps less or equal to a number ''N''. In case of a fully connected transition matrix, where all transitions have a non-zero probability, this condition is fulfilled with&nbsp;''N''&nbsp;=&nbsp;1.

A Markov chain with more than one state and just one out-going transition per state is either not irreducible or not aperiodic, hence cannot be ergodic.

==== Terminology ====
Some authors call any irreducible, positive recurrent Markov chains ergodic, even periodic ones.<ref>{{cite book |last1=Parzen |first1=Emanuel |title=Stochastic Processes |date=1962 |publisher=Holden-Day |isbn=0-8162-6664-6 |location=San Francisco |page=145}}</ref> In fact, merely irreducible Markov chains correspond to [[ergodicity|ergodic processes]], defined according to [[ergodic theory]].<ref name=":2" />

Some authors call a matrix ''primitive'' if there exists some integer <math>k</math> such that all entries of <math>M^k</math> are positive.<ref>{{Cite book |last=Seneta |first=E. (Eugene) |url=http://archive.org/details/nonnegativematri00esen_0 |title=Non-negative matrices; an introduction to theory and applications |date=1973 |publisher=New York, Wiley |others=Internet Archive |isbn=978-0-470-77605-6}}</ref> Some authors call it ''regular''.<ref>{{Cite web |date=2020-03-22 |title=10.3: Regular Markov Chains |url=https://math.libretexts.org/Bookshelves/Applied_Mathematics/Applied_Finite_Mathematics_(Sekhon_and_Bloom)/10%3A_Markov_Chains/10.03%3A_Regular_Markov_Chains |access-date=2024-02-01 |website=Mathematics LibreTexts |language=en}}</ref>

==== Index of primitivity ====
The ''index of primitivity'', or ''exponent'', of a regular matrix, is the smallest <math>k</math> such that all entries of <math>M^k</math> are positive. The exponent is purely a graph-theoretic property, since it depends only on whether each entry of <math>M</math> is zero or positive, and therefore can be found on a directed graph with <math>\mathrm{sign}(M)</math> as its adjacency matrix. 

There are several combinatorial results about the exponent when there are finitely many states. Let <math>n</math> be the number of states, then<ref>{{Cite book |last=Seneta |first=E. (Eugene) |url=http://archive.org/details/nonnegativematri00esen_0 |title=Non-negative matrices; an introduction to theory and applications |date=1973 |publisher=New York, Wiley |others=Internet Archive |isbn=978-0-470-77605-6 |chapter=2.4. Combinatorial properties}}</ref>

* The exponent is <math> \leq (n-1)^2 + 1 </math>. The only case where it is an equality is when the graph of <math>M</math> goes like <math>1 \to 2 \to \dots \to n \to 1 \text{ and } 2</math>.
* If <math>M</math> has <math>k \geq 1</math> diagonal entries, then its exponent is <math>\leq 2n-k-1</math>.
* If <math>\mathrm{sign}(M)</math> is symmetric, then <math>M^2</math> has positive diagonal entries, which by previous proposition means its exponent is <math>\leq 2n-2</math>.
* (Dulmage-Mendelsohn theorem) The exponent is <math>\leq n+s(n-2)</math> where <math>s</math> is the [[Girth (graph theory)|girth of the graph]]. It can be improved to <math>\leq (d+1)+s(d+1-2)</math>, where <math>d</math> is the [[Diameter (graph theory)|diameter of the graph]].<ref>{{Cite journal |last=Shen |first=Jian |date=1996-10-15 |title=An improvement of the Dulmage-Mendelsohn theorem |journal=Discrete Mathematics |volume=158 |issue=1 |pages=295–297 |doi=10.1016/0012-365X(95)00060-A |doi-access=free }}</ref>

=== Measure-preserving dynamical system ===
If a Markov chain has a stationary distribution, then it can be converted to a [[measure-preserving dynamical system]]: Let the probability space be <math>\Omega = \Sigma^\N</math>, where <math>\Sigma</math> is the set of all states for the Markov chain. Let the sigma-algebra on the probability space be generated by the cylinder sets. Let the probability measure be generated by the stationary distribution, and the Markov chain transition. Let <math>T: \Omega \to \Omega</math> be the shift operator: <math>T(X_0, X_1, \dots) = (X_1, \dots) </math>. Similarly we can construct such a dynamical system with <math>\Omega = \Sigma^\Z</math> instead.<ref>{{Cite book |last=Kallenberg |first=Olav |title=Foundations of modern probability |date=2002 |publisher=Springer |isbn=978-0-387-95313-7 |edition=2. ed., [Nachdr.] |series=Probability and its applications |location=New York, NY Berlin Heidelberg |at=Proposition 8.6 (page 145)}}</ref>

Since ''irreducible'' Markov chains with finite state spaces have a unique stationary distribution, the above construction is unambiguous for irreducible Markov chains.

In [[ergodic theory]], a measure-preserving dynamical system is called ''ergodic'' if any measurable subset <math>S</math> such that <math>T^{-1}(S) = S</math> implies <math>S = \emptyset</math> or <math>\Omega</math> (up to a null set).

The terminology is inconsistent. Given a Markov chain with a stationary distribution that is strictly positive on all states, the Markov chain is ''irreducible'' if its corresponding measure-preserving dynamical system is ''ergodic''.<ref name=":2">{{Cite web |last=Shalizi |first=Cosma |author-link=Cosma Shalizi |date=1 Dec 2023 |title=Ergodic Theory |url=http://bactra.org/notebooks/ergodic-theory.html |access-date=2024-02-01 |website=bactra.org}}</ref>

===Markovian representations===
In some cases, apparently non-Markovian processes may still have Markovian representations, constructed by expanding the concept of the "current" and "future" states. For example, let ''X'' be a non-Markovian process. Then define a process ''Y'', such that each state of ''Y'' represents a time-interval of states of ''X''. Mathematically, this takes the form:
:<math>Y(t) = \big\{ X(s): s \in [a(t), b(t)] \, \big\}.</math>
If ''Y'' has the Markov property, then it is a Markovian representation of ''X''.

An example of a non-Markovian process with a Markovian representation is an [[autoregressive model|autoregressive]] [[time series]] of order greater than one.<ref>{{cite journal |last1=Doblinger |first1=G. |title=Smoothing of noisy AR signals using an adaptive Kalman filter |journal=9th European Signal Processing Conference (EUSIPCO 1998) |date=September 1998 |pages=781–784 |url=https://publik.tuwien.ac.at/files/pub-et_3285.pdf}}</ref>

===Hitting times===
{{Main|Phase-type distribution}}The ''hitting time'' is the time, starting in a given set of states, until the chain arrives in a given state or set of states. The distribution of such a time period has a phase type distribution. The simplest such distribution is that of a single exponentially distributed transition.

====Expected hitting times====
For a subset of states ''A''&nbsp;⊆&nbsp;''S'', the vector ''k''<sup>''A''</sup> of hitting times (where element <math> k_i^A </math> represents the [[expected value]], starting in state ''i'' that the chain enters one of the states in the set ''A'') is the minimal non-negative solution to<ref name="norris2">{{cite book|title=Markov Chains|year=1997|isbn=9780511810633|pages=108–127|chapter=Continuous-time Markov chains II|doi=10.1017/CBO9780511810633.005|last1=Norris|first1=J. R.|author-link1=James R. Norris}}</ref>

:<math>\begin{align}
k_i^A = 0 & \text{ for } i \in A\\
-\sum_{j \in S} q_{ij} k_j^A = 1&\text{ for } i \notin A.
\end{align}</math>

===Time reversal===
For a CTMC ''X''<sub>''t''</sub>, the time-reversed process is defined to be <math> \hat X_t = X_{T-t}</math>. By [[Kelly's lemma]] this process has the same stationary distribution as the forward process.

A chain is said to be ''reversible'' if the reversed process is the same as the forward process. [[Kolmogorov's criterion]] states that the necessary and sufficient condition for a process to be reversible is that the product of transition rates around a closed loop must be the same in both directions.

=== Embedded Markov chain <!-- Embedded Markov chain redirects here --> ===

One method of finding the [[stationary probability distribution]], {{pi}}, of an [[ergodic]] continuous-time Markov chain, ''Q'', is by first finding its '''embedded Markov chain (EMC)'''. Strictly speaking, the EMC is a regular discrete-time Markov chain, sometimes referred to as a '''[[jump process]]'''. Each element of the one-step transition probability matrix of the EMC, ''S'', is denoted by ''s''<sub>''ij''</sub>, and represents the [[conditional probability]] of transitioning from state ''i'' into state ''j''. These conditional probabilities may be found by

:<math>
s_{ij} = \begin{cases}
\frac{q_{ij}}{\sum_{k \neq i} q_{ik}} & \text{if } i \neq j \\
0 & \text{otherwise}.
\end{cases}
</math>

From this, ''S'' may be written as
:<math>S = I - \left( \operatorname{diag}(Q) \right)^{-1} Q</math>
where ''I'' is the [[identity matrix]] and diag(''Q'') is the [[diagonal matrix]] formed by selecting the [[main diagonal]] from the matrix ''Q'' and setting all other elements to zero.

To find the stationary probability distribution vector, we must next find <math>\varphi</math> such that
:<math>\varphi S = \varphi, </math>

with <math>\varphi</math> being a row vector, such that all elements in <math>\varphi</math> are greater than 0 and [[norm (mathematics)|<math>\|\varphi\|_1</math>]] = 1. From this, {{pi}} may be found as
:<math>\pi = {-\varphi (\operatorname{diag}(Q))^{-1} \over \left\| \varphi (\operatorname{diag}(Q))^{-1} \right\|_1}.</math>

(''S'' may be periodic, even if ''Q'' is not. Once {{pi}} is found, it must be normalized to a [[unit vector]].)

Another discrete-time process that may be derived from a continuous-time Markov chain is a δ-skeleton&mdash;the (discrete-time) Markov chain formed by observing ''X''(''t'') at intervals of δ units of time. The random variables ''X''(0),&nbsp;''X''(δ),&nbsp;''X''(2δ),&nbsp;... give the sequence of states visited by the δ-skeleton.