Editing Baum–Welch algorithm (section)

==Example==
Suppose we have a chicken from which we collect eggs at noon every day. Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden. We can however (for simplicity) assume that the chicken is always in one of two states that influence whether the chicken lays eggs, and that this state only depends on the state on the previous day. Now we don't know the state at the initial starting point, we don't know the transition probabilities between the two states and we don't know the probability that the chicken lays an egg given a particular state.<ref>{{cite web |title=Baum-Welch and HMM applications |url=http://www.biostat.jhsph.edu/bstcourse/bio638/notes/HMMs_BaumWelch.pdf |publisher=Johns Hopkins Bloomberg School of Public Health |archive-url=https://web.archive.org/web/20210414224816/http://www.biostat.jhsph.edu/bstcourse/bio638/notes/HMMs_BaumWelch.pdf |access-date=11 October 2019 |archive-date=2021-04-14 }}</ref><ref>{{cite web |last=Frazzoli |first=Emilio |title=Intro to Hidden Markov Models: the Baum-Welch Algorithm |url=http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec21.pdf |publisher=Aeronautics and Astronautics, Massachusetts Institute of Technology |access-date=2 October 2013 }}</ref> To start we first guess the transition and emission matrices.

{| border="0" style="background:transparent; margin: 1em auto 1em auto;"
|-valign="bottom"
|
{| class="wikitable"
|+Transition
|-
! !! State 1 !! State 2
|-
! State 1
|0.5 || 0.5
|-
! State 2
|0.3 || 0.7
|}
||
{| class="wikitable"
|+ Emission
|-
! !! No Eggs !! Eggs
|-
! State 1
|0.3 || 0.7
|-
! State 2
|0.8 || 0.2
|}
||
{| class="wikitable"
|+ Initial
|-
! State 1
|0.2
|-
! State 2
|0.8
|}
|}
We then take a set of observations (E = eggs, N = no eggs): N, N, N, N, N, E, E, N, N, N

This gives us a set of observed transitions between days: NN, NN, NN, NN, NE, EE, EN, NN, NN

The next step is to estimate a new transition matrix. For example, the probability of the sequence NN and the state being {{tmath|S_1}} then {{tmath|S_2}} is given by the following,
<math>P(S_1) \cdot P(N|S_1) \cdot P(S_1 \rightarrow S_2) \cdot P(N|S_2).</math>
{| class="wikitable" style="margin: 1em auto 1em auto;"
|-
! Observed sequence !! Highest probability of observing that sequence if state is {{tmath|S_1}} then {{tmath|S_2}} !! colspan=2 | Highest Probability of observing that sequence
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NE || 0.006 = 0.2 &times; 0.3 &times; 0.5 &times; 0.2 || 0.1344 || {{tmath|S_2}},{{tmath|S_1}}
|-
| EE || 0.014 = 0.2 &times; 0.7 &times; 0.5 &times; 0.2 || 0.0490 || {{tmath|S_1}},{{tmath|S_1}}
|-
| EN || 0.056 = 0.2 &times; 0.7 &times; 0.5 &times; 0.8 || 0.0896 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
| NN || 0.024 = 0.2 &times; 0.3 &times; 0.5 &times; 0.8 || 0.3584 || {{tmath|S_2}},{{tmath|S_2}}
|-
! Total
| 0.22 || 2.4234 ||
|}

Thus the new estimate for the {{tmath|S_1}} to {{tmath|S_2}} transition is now <math>\frac{0.22}{2.4234}=0.0908</math> (referred to as "Pseudo probabilities" in the following tables). We then calculate the {{tmath|S_2}} to {{tmath|S_1}}, {{tmath|S_2}} to {{tmath|S_2}} and {{tmath|S_1}} to {{tmath|S_1}} transition probabilities and normalize so they add to 1. This gives us the updated transition matrix:

{| border="0" style="background:transparent; margin: 1em auto 1em auto;"
|-valign="bottom"
|
{| class="wikitable"
|+Old Transition Matrix
|-
! !! State 1 !! State 2
|-
! State 1
|0.5 || 0.5
|-
! State 2
|0.3 || 0.7
|}
||
{| class="wikitable"
|+New Transition Matrix (Pseudo Probabilities)
|-
! !! State 1 !! State 2
|-
! State 1
|0.0598 || 0.0908
|-
! State 2
|0.2179 || 0.9705
|}
||
{| class="wikitable"
|+New Transition Matrix (After Normalization)
|-
! !! State 1 !! State 2
|-
! State 1
|0.3973 || 0.6027
|-
! State 2
|0.1833 || 0.8167
|}
|}

Next, we want to estimate a new emission matrix,
{| class="wikitable" style="margin: 1em auto 1em auto;"
|-
! Observed Sequence !! colspan=2 | Highest probability of observing that sequence<br/> if E is assumed to come from {{tmath|S_1}} !! colspan=2 |Highest Probability of observing that sequence
|-
| NE || 0.1344 || {{tmath|S_2}},{{tmath|S_1}} || 0.1344 || {{tmath|S_2}},{{tmath|S_1}}
|-
| EE || 0.0490 || {{tmath|S_1}},{{tmath|S_1}} || 0.0490 || {{tmath|S_1}},{{tmath|S_1}}
|-
| EN || 0.0560 || {{tmath|S_1}},{{tmath|S_2}} || 0.0896 ||{{tmath|S_2}},{{tmath|S_2}}
|-
! Total
| 0.2394 || || 0.2730 ||
|}

The new estimate for the E coming from {{tmath|S_1}} emission is now <math>\frac{0.2394}{0.2730}=0.8769</math>.

This allows us to calculate the emission matrix as described above in the algorithm, by adding up the probabilities for the respective observed sequences. We then repeat for if N came from {{tmath|S_1}} and for if N and E came from {{tmath|S_2}} and normalize.

{| border="0" style="background:transparent; margin: 1em auto 1em auto;"
|-valign="bottom"
|
{| class="wikitable"
|+Old Emission Matrix
|-
! !! No Eggs !! Eggs
|-
! State 1
|0.3 || 0.7
|-
! State 2
|0.8 || 0.2
|}
||
{| class="wikitable"
|+New Emission Matrix (Estimates)
|-
! !! No Eggs !! Eggs
|-
! State 1
|0.0404 || 0.8769
|-
! State 2
|1.0000 || 0.7385
|}
||
{| class="wikitable"
|+New Emission Matrix (After Normalization)
|-
! !! No Eggs !! Eggs
|-
! State 1
|0.0441 || 0.9559
|-
! State 2
|0.5752 || 0.4248
|}
|}

To estimate the initial probabilities we assume all sequences start with the hidden state {{tmath|S_1}} and calculate the highest probability and then repeat for {{tmath|S_2}}. Again we then normalize to give an updated initial vector.

Finally we repeat these steps until the resulting probabilities converge satisfactorily.