Editing Hidden Markov model (section)

=== Discriminative approach ===
A different type of extension uses a [[discriminative model]] in place of the [[generative model]] of standard HMMs. This type of model directly models the conditional distribution of the hidden states given the observations, rather than modeling the joint distribution. An example of this model is the so-called ''[[maximum entropy Markov model]]'' (MEMM), which models the conditional distribution of the states using [[logistic regression]] (also known as a "[[Maximum entropy probability distribution|maximum entropy]] model"). The advantage of this type of model is that arbitrary features (i.e. functions) of the observations can be modeled, allowing domain-specific knowledge of the problem at hand to be injected into the model. Models of this sort are not limited to modeling direct dependencies between a hidden state and its associated observation; rather, features of nearby observations, of combinations of the associated observation and nearby observations, or in fact of arbitrary observations at any distance from a given hidden state can be included in the process used to determine the value of a hidden state. Furthermore, there is no need for these features to be [[statistically independent]] of each other, as would be the case if such features were used in a generative model. Finally, arbitrary features over pairs of adjacent hidden states can be used rather than simple transition probabilities. The disadvantages of such models are: (1) The types of prior distributions that can be placed on hidden states are severely limited; (2) It is not possible to predict the probability of seeing an arbitrary observation. This second limitation is often not an issue in practice, since many common usages of HMM's do not require such predictive probabilities.

A variant of the previously described discriminative model is the linear-chain [[conditional random field]]. This uses an undirected graphical model (aka [[Markov random field]]) rather than the directed graphical models of MEMM's and similar models. The advantage of this type of model is that it does not suffer from the so-called ''label bias'' problem of MEMM's, and thus may make more accurate predictions. The disadvantage is that training can be slower than for MEMM's.