Editing Mixture model (section)

===Topics in a document===
Assume that a document is composed of ''N'' different words from a total vocabulary of size ''V'', where each word corresponds to one of ''K'' possible topics.  The distribution of such words could be modelled as a mixture of ''K'' different ''V''-dimensional [[categorical distribution]]s.  A model of this sort is commonly termed a [[topic model]].  Note that [[expectation maximization]] applied to such a model will typically fail to produce realistic results, due (among other things) to the [[overfitting|excessive number of parameters]].  Some sorts of additional assumptions are typically necessary to get good results.  Typically two sorts of additional components are added to the model:
#A [[prior distribution]] is placed over the parameters describing the topic distributions, using a [[Dirichlet distribution]] with a [[concentration parameter]] that is set significantly below 1, so as to encourage sparse distributions (where only a small number of words have significantly non-zero probabilities).
#Some sort of additional constraint is placed over the topic identities of words, to take advantage of natural clustering.
#*For example, a [[Markov chain]] could be placed on the topic identities (i.e., the latent variables specifying the mixture component of each observation), corresponding to the fact that nearby words belong to similar topics. (This results in a [[hidden Markov model]], specifically one where a [[prior distribution]] is placed over state transitions that favors transitions that stay in the same state.)
#*Another possibility is the [[latent Dirichlet allocation]] model, which divides up the words into ''D'' different documents and assumes that in each document only a small number of topics occur with any frequency.