Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dirichlet distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Conjugate to categorical or multinomial=== The Dirichlet distribution is the [[conjugate prior]] distribution of the [[categorical distribution]] (a generic [[discrete probability distribution]] with a given number of possible outcomes) and [[multinomial distribution]] (the distribution over observed counts of each possible category in a set of categorically distributed observations). This means that if a data point has either a categorical or multinomial distribution, and the [[prior distribution]] of the distribution's parameter (the vector of probabilities that generates the data point) is distributed as a Dirichlet, then the [[posterior distribution]] of the parameter is also a Dirichlet. Intuitively, in such a case, starting from what we know about the parameter prior to observing the data point, we then can update our knowledge based on the data point and end up with a new distribution of the same form as the old one. This means that we can successively update our knowledge of a parameter by incorporating new observations one at a time, without running into mathematical difficulties. Formally, this can be expressed as follows. Given a model <math display=block>\begin{array}{rcccl} \boldsymbol\alpha &=& \left(\alpha_1, \ldots, \alpha_K \right) &=& \text{concentration hyperparameter} \\ \mathbf{p}\mid\boldsymbol\alpha &=& \left(p_1, \ldots, p_K \right ) &\sim& \operatorname{Dir}(K, \boldsymbol\alpha) \\ \mathbb{X}\mid\mathbf{p} &=& \left(\mathbf{x}_1, \ldots, \mathbf{x}_K \right ) &\sim& \operatorname{Cat}(K,\mathbf{p}) \end{array}</math> then the following holds: <math display=block>\begin{array}{rcccl} \mathbf{c} &=& \left(c_1, \ldots, c_K \right ) &=& \text{number of occurrences of category }i \\ \mathbf{p} \mid \mathbb{X},\boldsymbol\alpha &\sim& \operatorname{Dir}(K,\mathbf{c}+\boldsymbol\alpha) &=& \operatorname{Dir} \left (K,c_1+\alpha_1,\ldots,c_K+\alpha_K \right) \end{array}</math> This relationship is used in [[Bayesian statistics]] to estimate the underlying parameter {{math|'''p'''}} of a [[categorical distribution]] given a collection of {{mvar|N}} samples. Intuitively, we can view the [[hyperprior]] vector {{math|'''Ξ±'''}} as [[pseudocount]]s, i.e. as representing the number of observations in each category that we have already seen. Then we simply add in the counts for all the new observations (the vector {{math|'''c'''}}) in order to derive the posterior distribution. In Bayesian [[mixture model]]s and other [[hierarchical Bayesian model]]s with mixture components, Dirichlet distributions are commonly used as the prior distributions for the [[categorical distribution|categorical variable]]s appearing in the models. See the section on [[#Occurrence and applications|applications]] below for more information.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)