Editing Forward algorithm (section)

==Algorithm==
The goal of the forward algorithm is to compute the [[Joint probability distribution|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained. 

Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The hidden Markov model's state transition probabilities <math>p(x_t|x_{t-1})</math>, observation/emission probabilities <math>p(y_t|x_t)</math>, and initial prior probability <math>p(x_0)</math> are assumed to be known. Furthermore, the sequence of observations <math>y_{1:t}</math> are assumed to be given. 

Computing <math>p(x_t,y_{1:t})</math> naively would require [[Marginal distribution|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively.

To demonstrate the recursion, let

::<math>\alpha(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t})</math>.

Using the [[Chain rule (probability)|chain rule]] to expand <math>p(x_t,x_{t-1},y_{1:t})</math>, we can then write

::<math>\alpha(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1})</math>.

Because <math>y_t</math> is conditionally independent of everything but <math>x_t</math>, and <math>x_t</math> is conditionally independent of everything but <math>x_{t-1}</math>, this simplifies to

::<math>\alpha(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha(x_{t-1})</math>.

Thus, since <math>p(y_t|x_t)</math> and <math>p(x_t|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture|emission distributions]] and [[Hidden Markov model#Structural_architecture|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\alpha(x_t)</math> from <math>\alpha(x_{t-1})</math> and avoid incurring exponential computation time. 

The recursion formula given above can be written in a more compact form. Let <math>a_{ij}=p(x_t=i|x_{t-1}=j)</math> be the transition probabilities and <math>b_{ij}=p(y_t=i|x_t=j)</math> be the emission probabilities, then

::<math>\mathbf{\alpha}_t = \mathbf{b}_t^T \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math>

where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_t</math> is the i-th row of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = i</math> at time <math>t</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)|hadamard product]] between the transpose of <math>\mathbf{b}_t</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.

The initial condition is set in accordance to the prior probability over <math>x_0</math> as 

::<math>\alpha(x_0) = p(y_0|x_0)p(x_0)</math>.

Once the joint probability <math>\alpha(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as

::<math>p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t)</math> 

and the required conditional probability <math>p(x_t|y_{1:t})</math> as

::<math>p(x_t|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math>

Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by

::<math>\widehat{x}_t^{MAP} = \arg \max_{x_t} \; p(x_t|y_{1:t}) = \arg \max_{x_t} \; \alpha(x_t),</math>

while the MMSE estimate of <math>x_t</math> is given by 

::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t|y_{1:t}] = \sum_{x_t} x_t p(x_t|y_{1:t}) = \frac{\sum_{x_t} x_t \alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math>

The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR|Markov jump linear system]].