Editing Dirichlet distribution (section)

===Conjugate to categorical or multinomial===
The Dirichlet distribution is the [[conjugate prior]] distribution of the [[categorical distribution]] (a generic [[discrete probability distribution]] with a given number of possible outcomes) and [[multinomial distribution]] (the distribution over observed counts of each possible category in a set of categorically distributed observations).  This means that if a data point has either a categorical or multinomial distribution, and the [[prior distribution]] of the distribution's parameter (the vector of probabilities that generates the data point) is distributed as a Dirichlet, then the [[posterior distribution]] of the parameter is also a Dirichlet.  Intuitively, in such a case, starting from what we know about the parameter prior to observing the data point, we then can update our knowledge based on the data point and end up with a new distribution of the same form as the old one.  This means that we can successively update our knowledge of a parameter by incorporating new observations one at a time, without running into mathematical difficulties.

Formally, this can be expressed as follows.  Given a model

<math display=block>\begin{array}{rcccl}
\boldsymbol\alpha &=& \left(\alpha_1, \ldots, \alpha_K \right) &=& \text{concentration hyperparameter} \\
\mathbf{p}\mid\boldsymbol\alpha &=& \left(p_1, \ldots, p_K \right ) &\sim& \operatorname{Dir}(K, \boldsymbol\alpha) \\
\mathbb{X}\mid\mathbf{p} &=& \left(\mathbf{x}_1, \ldots, \mathbf{x}_K \right ) &\sim& \operatorname{Cat}(K,\mathbf{p})
\end{array}</math>

then the following holds:

<math display=block>\begin{array}{rcccl}
\mathbf{c} &=& \left(c_1, \ldots, c_K \right ) &=& \text{number of occurrences of category }i \\
\mathbf{p} \mid \mathbb{X},\boldsymbol\alpha &\sim& \operatorname{Dir}(K,\mathbf{c}+\boldsymbol\alpha) &=& \operatorname{Dir} \left (K,c_1+\alpha_1,\ldots,c_K+\alpha_K \right)
\end{array}</math>

This relationship is used in [[Bayesian statistics]] to estimate the underlying parameter {{math|'''p'''}} of a [[categorical distribution]] given a collection of {{mvar|N}} samples. Intuitively, we can view the [[hyperprior]] vector {{math|'''α'''}} as [[pseudocount]]s, i.e. as representing the number of observations in each category that we have already seen.  Then we simply add in the counts for all the new observations (the vector {{math|'''c'''}}) in order to derive the posterior distribution.

In Bayesian [[mixture model]]s and other [[hierarchical Bayesian model]]s with mixture components, Dirichlet distributions are commonly used as the prior distributions for the [[categorical distribution|categorical variable]]s appearing in the models.  See the section on [[#Occurrence and applications|applications]] below for more information.