Editing Dirichlet distribution (section)

===Generalization by scaling and translation of log-probabilities===
As noted above, Dirichlet variates can be generated by normalizing independent [[Gamma distribution|gamma]] variates. If instead one normalizes [[Generalized gamma distribution|generalized gamma]] variates, one obtains variates from the [[simplicial generalized beta distribution]] (SGB).<ref name="sgb">{{cite web |last1=Graf |first1=Monique (2019)|title=The Simplicial Generalized Beta distribution - R-package SGB and applications |url=https://libra.unine.ch/server/api/core/bitstreams/dd593778-b1fd-4856-855b-7b21e005ee77/content |website=Libra |access-date=26 May 2025}}</ref> On the other hand, SGB variates can also be obtained by applying the [[softmax function]] to scaled and translated logarithms of Dirichlet variates. Specifically, let <math>\mathbf x = (x_1, \ldots, x_K)\sim\operatorname{Dir}(\boldsymbol\alpha)</math> and let <math>\mathbf y = (y_1, \ldots, y_k)</math>, where applying the logarithm elementwise:
<math display=block>
\mathbf y = \operatorname{softmax}(a^{-1}\log\mathbf x + \log\mathbf b)\;\iff\;\mathbf x = \operatorname{softmax}(a\log\mathbf y - a\log\mathbf b)
</math>
or
<math display=block>
y_k = \frac{b_kx_k^{1/a}}{\sum_{i=1}^Kb_ix_i^{1/a}}\; \iff\;
x_k = \frac{(y_k/b_k)^a}{\sum_{i=1}^K(y_i/b_i)^a}
</math>
where <math>a>0</math> and <math>\mathbf b = (b_1, \ldots, b_K)</math>, with all <math>b_k>0</math>, then <math>\mathbf y\sim\operatorname{SGB}(a, \mathbf b, \boldsymbol\alpha)</math>. The SGB density function can be derived by noting that the transformation <math>\mathbf x\mapsto\mathbf y</math>, which is a [[bijection]] from the simplex to itself, induces a differential volume change factor<ref name='manifold_flow'>{{cite web |last1=Sorrenson |first1=Peter |last2=et al. (2024) |title=Learning Distributions on Manifolds with Free-Form Flows |url=https://arxiv.org/abs/2312.09852 |website=arXiv}}</ref> of:
<math display=block>
R(\mathbf y, a,\mathbf b) = a^{1-K}\prod_{k=1}^K\frac{y_k}{x_k} 
</math>
where it is understood that <math>\mathbf x</math> is recovered as a function of <math>\mathbf y</math>, as shown above. This facilitates writing the SGB density in terms of the Dirichlet density, as:
<math display=block>
f_{\text{SGB}}(\mathbf y\mid a, \mathbf b, \boldsymbol\alpha) = \frac{f_{\text{Dir}}(\mathbf x\mid\boldsymbol\alpha)}{R(\mathbf y,a,\mathbf b)}
</math>
This generalization of the Dirichlet density, via a [[change of variables]], is closely related to a [[normalizing flow]], while it must be noted that the differential volume change is not given by the [[Jacobian determinant]] of <math>\mathbf x\mapsto\mathbf y:\mathbb R^K\to\mathbb R^K</math> which is zero, but by the Jacobian determinant of <math>(x_1,\ldots,x_{K-1})\mapsto\mathbf (y_1,\ldots,y_{K-1})</math>. 

For further insight into the interaction between the Dirichlet shape parameters <math>\boldsymbol\alpha</math>, and the transformation parameters <math>a, \mathbf b</math>, it may be helpful to consider the logarithmic marginals, <math>\log\frac{x_k}{1-x_k}</math>, which follow the [[logistic-beta distribution]], <math>B_\sigma(\alpha_k,\sum_{i\ne k} \alpha_i)</math>. See in particular the sections on [[Generalized_logistic_distribution#Tail_behaviour|tail behaviour]] and [[Generalized_logistic_distribution#Generalization_with_location_and_scale_parameters|generalization with location and scale parameters]].      

====Application====
When <math>b_1=b_2=\cdots=b_K</math>, then the transformation simplifies to <math>\mathbf x\mapsto\operatorname{softmax}(a^{-1}\log\mathbf x)</math>, which is known as [[Platt_scaling#Analysis|temperature scaling]] in [[machine learning]], where it is used as a calibration transform for multiclass probabilistic classiers.<ref>{{cite journal |last1=Ferrer |first1=Luciana |last2=Ramos |first2=Daniel |title=Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration |journal=Transactions on Machine Learning Research |date=2025 |url=https://openreview.net/forum?id=qbrE0LR7fF}}</ref> Traditionally the temperature parameter (<math>a</math> here) is learnt [[Discriminative_model|discriminatively]] by minimizing multiclass [[cross-entropy]] over a supervised calibration data set with known class labels. But the above PDF transformation mechanism can be used to facilitate also the design of [[Generative_model|generatively trained]] calibration models with a temperature scaling component.