Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Forward algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Algorithm== The goal of the forward algorithm is to compute the [[Joint probability distribution|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained. Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The hidden Markov model's state transition probabilities <math>p(x_t|x_{t-1})</math>, observation/emission probabilities <math>p(y_t|x_t)</math>, and initial prior probability <math>p(x_0)</math> are assumed to be known. Furthermore, the sequence of observations <math>y_{1:t}</math> are assumed to be given. Computing <math>p(x_t,y_{1:t})</math> naively would require [[Marginal distribution|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively. To demonstrate the recursion, let ::<math>\alpha(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t})</math>. Using the [[Chain rule (probability)|chain rule]] to expand <math>p(x_t,x_{t-1},y_{1:t})</math>, we can then write ::<math>\alpha(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1})</math>. Because <math>y_t</math> is conditionally independent of everything but <math>x_t</math>, and <math>x_t</math> is conditionally independent of everything but <math>x_{t-1}</math>, this simplifies to ::<math>\alpha(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha(x_{t-1})</math>. Thus, since <math>p(y_t|x_t)</math> and <math>p(x_t|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture|emission distributions]] and [[Hidden Markov model#Structural_architecture|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\alpha(x_t)</math> from <math>\alpha(x_{t-1})</math> and avoid incurring exponential computation time. The recursion formula given above can be written in a more compact form. Let <math>a_{ij}=p(x_t=i|x_{t-1}=j)</math> be the transition probabilities and <math>b_{ij}=p(y_t=i|x_t=j)</math> be the emission probabilities, then ::<math>\mathbf{\alpha}_t = \mathbf{b}_t^T \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math> where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_t</math> is the i-th row of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = i</math> at time <math>t</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)|hadamard product]] between the transpose of <math>\mathbf{b}_t</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>. The initial condition is set in accordance to the prior probability over <math>x_0</math> as ::<math>\alpha(x_0) = p(y_0|x_0)p(x_0)</math>. Once the joint probability <math>\alpha(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as ::<math>p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t)</math> and the required conditional probability <math>p(x_t|y_{1:t})</math> as ::<math>p(x_t|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math> Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by ::<math>\widehat{x}_t^{MAP} = \arg \max_{x_t} \; p(x_t|y_{1:t}) = \arg \max_{x_t} \; \alpha(x_t),</math> while the MMSE estimate of <math>x_t</math> is given by ::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t|y_{1:t}] = \sum_{x_t} x_t p(x_t|y_{1:t}) = \frac{\sum_{x_t} x_t \alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math> The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR|Markov jump linear system]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)