Editing Hidden Markov model (section)

== Learning ==

The parameter learning task in HMMs is to find, given an output sequence or a set of such sequences, the best set of state transition and emission probabilities. The task is usually to derive the [[maximum likelihood]] estimate of the parameters of the HMM given the set of output sequences. No tractable algorithm is known for solving this problem exactly, but a local maximum likelihood can be derived efficiently using the [[Baum–Welch algorithm]] or the Baldi–Chauvin algorithm. The Baum–Welch algorithm is a special case of the [[expectation-maximization algorithm]]. 

If the HMMs are used for time series prediction, more sophisticated Bayesian inference methods, like [[Markov chain Monte Carlo]] (MCMC) sampling are proven to be favorable over finding a single maximum likelihood model both in terms of accuracy and stability.<ref>Sipos, I. Róbert. ''Parallel stratified MCMC sampling of AR-HMMs for stochastic time series prediction''. In: Proceedings, 4th Stochastic Modeling Techniques and Data Analysis International Conference with Demographics Workshop (SMTDA2016), pp. 295-306. Valletta, 2016. [http://1drv.ms/b/s!ApL_0Av0YGDLglwEOv1aYAGbmQeL PDF]</ref> Since MCMC imposes significant computational burden, in cases where computational scalability is also of interest, one may alternatively resort to variational approximations to Bayesian inference, e.g.<ref>{{cite journal |url=http://users.iit.demokritos.gr/~dkosmo/downloads/patrec10/vbb10.pdf |doi=10.1016/j.patcog.2010.09.001 |volume=44 |issue=2 |title=A variational Bayesian methodology for hidden Markov models utilizing Student's-t mixtures |year=2011 |journal=Pattern Recognition |pages=295–306 |last1=Chatzis |first1=Sotirios P. |last2=Kosmopoulos |first2=Dimitrios I. |bibcode=2011PatRe..44..295C |citeseerx=10.1.1.629.6275 |access-date=2018-03-11 |archive-date=2011-04-01 |archive-url=https://web.archive.org/web/20110401184517/http://users.iit.demokritos.gr/~dkosmo/downloads/patrec10/vbb10.pdf |url-status=dead}}</ref> Indeed, approximate variational inference offers computational efficiency comparable to expectation-maximization, while yielding an accuracy profile only slightly inferior to exact MCMC-type Bayesian inference.