Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dirichlet distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Generalization by scaling and translation of log-probabilities=== As noted above, Dirichlet variates can be generated by normalizing independent [[Gamma distribution|gamma]] variates. If instead one normalizes [[Generalized gamma distribution|generalized gamma]] variates, one obtains variates from the [[simplicial generalized beta distribution]] (SGB).<ref name="sgb">{{cite web |last1=Graf |first1=Monique (2019)|title=The Simplicial Generalized Beta distribution - R-package SGB and applications |url=https://libra.unine.ch/server/api/core/bitstreams/dd593778-b1fd-4856-855b-7b21e005ee77/content |website=Libra |access-date=26 May 2025}}</ref> On the other hand, SGB variates can also be obtained by applying the [[softmax function]] to scaled and translated logarithms of Dirichlet variates. Specifically, let <math>\mathbf x = (x_1, \ldots, x_K)\sim\operatorname{Dir}(\boldsymbol\alpha)</math> and let <math>\mathbf y = (y_1, \ldots, y_k)</math>, where applying the logarithm elementwise: <math display=block> \mathbf y = \operatorname{softmax}(a^{-1}\log\mathbf x + \log\mathbf b)\;\iff\;\mathbf x = \operatorname{softmax}(a\log\mathbf y - a\log\mathbf b) </math> or <math display=block> y_k = \frac{b_kx_k^{1/a}}{\sum_{i=1}^Kb_ix_i^{1/a}}\; \iff\; x_k = \frac{(y_k/b_k)^a}{\sum_{i=1}^K(y_i/b_i)^a} </math> where <math>a>0</math> and <math>\mathbf b = (b_1, \ldots, b_K)</math>, with all <math>b_k>0</math>, then <math>\mathbf y\sim\operatorname{SGB}(a, \mathbf b, \boldsymbol\alpha)</math>. The SGB density function can be derived by noting that the transformation <math>\mathbf x\mapsto\mathbf y</math>, which is a [[bijection]] from the simplex to itself, induces a differential volume change factor<ref name='manifold_flow'>{{cite web |last1=Sorrenson |first1=Peter |last2=et al. (2024) |title=Learning Distributions on Manifolds with Free-Form Flows |url=https://arxiv.org/abs/2312.09852 |website=arXiv}}</ref> of: <math display=block> R(\mathbf y, a,\mathbf b) = a^{1-K}\prod_{k=1}^K\frac{y_k}{x_k} </math> where it is understood that <math>\mathbf x</math> is recovered as a function of <math>\mathbf y</math>, as shown above. This facilitates writing the SGB density in terms of the Dirichlet density, as: <math display=block> f_{\text{SGB}}(\mathbf y\mid a, \mathbf b, \boldsymbol\alpha) = \frac{f_{\text{Dir}}(\mathbf x\mid\boldsymbol\alpha)}{R(\mathbf y,a,\mathbf b)} </math> This generalization of the Dirichlet density, via a [[change of variables]], is closely related to a [[normalizing flow]], while it must be noted that the differential volume change is not given by the [[Jacobian determinant]] of <math>\mathbf x\mapsto\mathbf y:\mathbb R^K\to\mathbb R^K</math> which is zero, but by the Jacobian determinant of <math>(x_1,\ldots,x_{K-1})\mapsto\mathbf (y_1,\ldots,y_{K-1})</math>. For further insight into the interaction between the Dirichlet shape parameters <math>\boldsymbol\alpha</math>, and the transformation parameters <math>a, \mathbf b</math>, it may be helpful to consider the logarithmic marginals, <math>\log\frac{x_k}{1-x_k}</math>, which follow the [[logistic-beta distribution]], <math>B_\sigma(\alpha_k,\sum_{i\ne k} \alpha_i)</math>. See in particular the sections on [[Generalized_logistic_distribution#Tail_behaviour|tail behaviour]] and [[Generalized_logistic_distribution#Generalization_with_location_and_scale_parameters|generalization with location and scale parameters]]. ====Application==== When <math>b_1=b_2=\cdots=b_K</math>, then the transformation simplifies to <math>\mathbf x\mapsto\operatorname{softmax}(a^{-1}\log\mathbf x)</math>, which is known as [[Platt_scaling#Analysis|temperature scaling]] in [[machine learning]], where it is used as a calibration transform for multiclass probabilistic classiers.<ref>{{cite journal |last1=Ferrer |first1=Luciana |last2=Ramos |first2=Daniel |title=Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration |journal=Transactions on Machine Learning Research |date=2025 |url=https://openreview.net/forum?id=qbrE0LR7fF}}</ref> Traditionally the temperature parameter (<math>a</math> here) is learnt [[Discriminative_model|discriminatively]] by minimizing multiclass [[cross-entropy]] over a supervised calibration data set with known class labels. But the above PDF transformation mechanism can be used to facilitate also the design of [[Generative_model|generatively trained]] calibration models with a temperature scaling component.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)