Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Mixture model
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Topics in a document=== Assume that a document is composed of ''N'' different words from a total vocabulary of size ''V'', where each word corresponds to one of ''K'' possible topics. The distribution of such words could be modelled as a mixture of ''K'' different ''V''-dimensional [[categorical distribution]]s. A model of this sort is commonly termed a [[topic model]]. Note that [[expectation maximization]] applied to such a model will typically fail to produce realistic results, due (among other things) to the [[overfitting|excessive number of parameters]]. Some sorts of additional assumptions are typically necessary to get good results. Typically two sorts of additional components are added to the model: #A [[prior distribution]] is placed over the parameters describing the topic distributions, using a [[Dirichlet distribution]] with a [[concentration parameter]] that is set significantly below 1, so as to encourage sparse distributions (where only a small number of words have significantly non-zero probabilities). #Some sort of additional constraint is placed over the topic identities of words, to take advantage of natural clustering. #*For example, a [[Markov chain]] could be placed on the topic identities (i.e., the latent variables specifying the mixture component of each observation), corresponding to the fact that nearby words belong to similar topics. (This results in a [[hidden Markov model]], specifically one where a [[prior distribution]] is placed over state transitions that favors transitions that stay in the same state.) #*Another possibility is the [[latent Dirichlet allocation]] model, which divides up the words into ''D'' different documents and assumes that in each document only a small number of topics occur with any frequency.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)