Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Baum–Welch algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Update==== We can now calculate the temporary variables, according to Bayes' theorem: :<math>\gamma_i(t)=P(X_t=i\mid Y,\theta) = \frac{P(X_t=i,Y\mid\theta)}{P(Y\mid\theta)} = \frac{\alpha_i(t)\beta_i(t)}{\sum_{j=1}^N \alpha_j(t)\beta_j(t)},</math> which is the probability of being in state <math>i</math> at time <math>t</math> given the observed sequence <math>Y</math> and the parameters <math>\theta</math> :<math>\xi_{ij}(t)=P(X_t=i,X_{t+1}=j\mid Y,\theta) = \frac{P(X_t=i,X_{t+1}=j,Y\mid\theta)}{P(Y\mid\theta)} = \frac{\alpha_i(t) a_{ij} \beta_j(t+1) b_j(y_{t+1})}{\sum_{k=1}^N \sum_{w=1}^N \alpha_k(t) a_{kw} \beta_w(t+1) b_w(y_{t+1}) }, </math> which is the probability of being in state <math>i</math> and <math>j</math> at times <math>t</math> and <math>t+1</math> respectively given the observed sequence <math>Y</math> and parameters <math>\theta</math>. The denominators of <math>\gamma_i(t)</math> and <math>\xi_{ij}(t)</math> are the same ; they represent the probability of making the observation <math>Y</math> given the parameters <math>\theta</math>. The parameters of the hidden Markov model <math>\theta</math> can now be updated: *<math>\pi_i^* = \gamma_i(1),</math> which is the expected frequency spent in state <math>i</math> at time <math>1</math>. *<math>a_{ij}^*=\frac{\sum^{T-1}_{t=1}\xi_{ij}(t)}{\sum^{T-1}_{t=1}\gamma_i(t)},</math> which is the expected number of transitions from state ''i'' to state ''j'' compared to the expected total number of transitions away from state ''i''. To clarify, the number of transitions away from state ''i'' does not mean transitions to a different state ''j'', but to any state including itself. This is equivalent to the number of times state ''i'' is observed in the sequence from ''t'' = 1 to ''t'' = ''T'' − 1. *<math>b_i^*(v_k)=\frac{\sum^T_{t=1} 1_{y_t=v_k} \gamma_i(t)}{\sum^T_{t=1} \gamma_i(t)},</math> where :<math> 1_{y_t=v_k}= \begin{cases} 1 & \text{if } y_t=v_k,\\ 0 & \text{otherwise} \end{cases} </math> is an indicator function, and <math>b_i^*(v_k)</math> is the expected number of times the output observations have been equal to <math>v_k</math> while in state <math>i</math> over the expected total number of times in state <math>i</math>. These steps are now repeated iteratively until a desired level of convergence. '''Note:''' It is possible to over-fit a particular data set. That is, <math>P(Y\mid\theta_\text{final}) > P(Y \mid \theta_\text{true}) </math>. The algorithm also does '''not''' guarantee a global maximum.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)