Editing Beta distribution (section)

===Alternative parameterizations===

====Two parameters====

=====Mean and sample size=====
The beta distribution may also be reparameterized in terms of its mean ''μ'' {{nowrap|1=(0 < ''μ'' < 1)}} and the sum of the two shape parameters {{nowrap|1= ''ν'' = ''α'' + ''β'' > 0}}(<ref name=Kruschke2011>{{cite book|last=Kruschke|first=John K.|author-link=John K. Kruschke|title=Doing Bayesian data analysis: A tutorial with R and BUGS|year=2011|publisher=Academic Press / Elsevier|page=83|isbn=978-0123814852}}</ref> p.&nbsp;83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability,  the interpretation of the addition of both shape parameters to be sample size = ''ν'' = ''α''·Posterior + ''β''·Posterior is only correct for the Haldane prior probability Beta(0,0).  Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = ''α''·Posterior + ''β''&nbsp;Posterior − 2, or ''ν'' = (sample size) + 2.  For sample size much larger than 2, the difference between these two priors becomes negligible.  (See section [[#Bayesian inference|Bayesian inference]] for further details.) ''ν'' = ''α'' + ''β'' is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem.

This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ ''θ'' ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters ''α'' and ''β'' via<ref name=Kruschke2011/>

: ''α'' = ''μν'', ''β'' = (1 − ''μ'')''ν''

Under this [[Statistical parameter|parametrization]], one may place an [[uninformative prior]] probability over the mean, and a vague prior probability (such as an [[exponential distribution|exponential]] or [[gamma distribution]]) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it.

=====Mode and concentration=====
[[Concave function|Concave]] beta distributions, which have <math>\alpha,\beta>1</math>, can be parametrized in terms of mode and "concentration". The mode, <math>\omega=\frac{\alpha-1}{\alpha+\beta-2}</math>, and concentration, <math>\kappa = \alpha + \beta</math>, can be used to define the usual shape parameters as follows:<ref name="Kruschke2015">{{cite book|last=Kruschke|first=John K.|author-link=John K. Kruschke|title=Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan|year=2015|publisher=Academic Press / Elsevier|isbn=978-0-12-405888-0}}</ref>
:<math>\begin{align}
\alpha &= \omega (\kappa - 2) + 1\\
\beta  &= (1 - \omega)(\kappa - 2) + 1
\end{align}</math>
For the mode, <math>0<\omega<1</math>, to be well-defined, we need <math>\alpha,\beta>1</math>, or equivalently <math>\kappa>2</math>. If instead we define the concentration as <math>c=\alpha+\beta-2</math>, the condition simplifies to <math>c>0</math> and the beta density at <math>\alpha=1+c\omega</math> and <math>\beta=1+c(1-\omega)</math> can be written as:
:<math>
f(x;\omega,c) = \frac{x^{c\omega}(1-x)^{c(1-\omega)}}{\Beta\bigl(1+c\omega,1+c(1-\omega)\bigr)}
</math>
where <math>c</math> directly scales the [[sufficient statistics]], <math>\log(x)</math> and <math>\log(1-x)</math>. Note also that in the limit, <math>c\to0</math>, the distribution becomes flat.

=====Mean and variance=====

Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters ''α'' and ''β'', one can express the ''α'' and ''β'' parameters in terms of the mean (''μ'') and the variance (var):

:<math> \begin{align}
\nu &= \alpha + \beta = \frac{\mu(1-\mu)}{\mathrm{var}}-1, \text{ where }\nu =(\alpha + \beta)  >0,\text{ therefore: }\text{var}< \mu(1-\mu)\\
\alpha&= \mu \nu =\mu \left(\frac{\mu(1-\mu)}{\text{var}}-1\right), \text{ if } \text{var}< \mu(1-\mu)\\
\beta &= (1 - \mu) \nu = (1 - \mu)\left(\frac{\mu(1-\mu)}{\text{var}}-1\right), \text{ if }\text{var}< \mu(1-\mu).
\end{align}</math>

This [[Statistical parameter|parametrization]] of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters ''α'' and ''β''. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:

[[File:Mode Beta Distribution for both alpha and beta greater than 1 - J. Rodal.jpg|325px]]
[[File:Mode Beta Distribution for both alpha and beta greater than 1 - another view - J. Rodal.jpg|325px]]
[[File:Skewness Beta Distribution for mean full range and variance between 0.05 and 0.25 - Dr. J. Rodal.jpg|325px]]
[[File:Skewness Beta Distribution for mean and variance both full range - J. Rodal.jpg|325px]]
[[File:Excess Kurtosis Beta Distribution with mean for full range and variance from 0.05 to 0.25 - J. Rodal.jpg|325px]]
[[File:Excess Kurtosis Beta Distribution with mean and variance for full range - J. Rodal.jpg|325px]]
[[File:Differential Entropy Beta Distribution with mean from 0.2 to 0.8 and variance from 0.01 to 0.09 - J. Rodal.jpg|325px]]
[[File:Differential Entropy Beta Distribution with mean from 0.3 to 0.7 and variance from 0 to 0.2 - J. Rodal.jpg|325px]]

====Four parameters====
A beta distribution with the two shape parameters ''α'' and ''β'' is supported on the range [0,1] or (0,1).  It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, ''a'', and maximum ''c'' (''c'' > ''a''), values of the distribution,<ref name=JKB/>  by a linear transformation substituting the non-dimensional variable ''x'' in terms of the new variable ''y'' (with support [''a'',''c''] or (''a'',''c'')) and the parameters ''a'' and ''c'':

:<math>y = x(c-a) + a,  \text{ therefore } x = \frac{y-a}{c-a}.</math>

The [[probability density function]] of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (''c''&nbsp;−&nbsp;''a''), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows:
::<math>f(y; \alpha, \beta, a, c) = \frac{f(x;\alpha,\beta)}{c-a} =\frac{\left(\frac{y-a}{c-a}\right)^{\alpha-1} \left (\frac{c-y}{c-a} \right)^{\beta-1} }{(c-a)B(\alpha, \beta)}=\frac{ (y-a)^{\alpha-1} (c-y)^{\beta-1} }{(c-a)^{\alpha+\beta-1}B(\alpha, \beta)}.</math>

That a random variable ''Y'' is beta-distributed with four parameters ''α'', ''β'', ''a'', and ''c'' will be denoted by:

:<math>Y \sim \operatorname{Beta}(\alpha, \beta, a, c).</math>

Some measures of central location are scaled (by (''c''&nbsp;−&nbsp;''a'')) and shifted (by ''a''), as follows:

:<math> \begin{align}
\mu_Y &= \mu_X(c-a) + a \\
& = \left(\frac{\alpha}{\alpha+\beta}\right)(c-a) + a = \frac{\alpha c+ \beta a}{\alpha+\beta} \\[8pt]
\text{mode}(Y) &=\text{mode}(X)(c-a) + a \\
& = \left(\frac{\alpha - 1}{\alpha+\beta - 2}\right)(c-a) + a = \frac{(\alpha-1) c+(\beta-1) a}{\alpha+\beta-2}\ ,\qquad \text{ if } \alpha, \beta>1 \\[8pt]
\text{median}(Y) &= \text{median}(X)(c-a) + a \\
& = \left (I_{\frac{1}{2}}^{[-1]}(\alpha,\beta) \right )(c-a)+a
\end{align}
</math>

Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.

The shape parameters of ''Y'' can be written in term of its mean and variance as

:<math> \begin{align}
\alpha &= \frac{(a - \mu_Y)(a \, c - a \, \mu_Y - c \, \mu_Y + \mu_Y^2 + \sigma_Y^2)}{\sigma_Y^2(c-a)} \\
\beta &=  -\frac{(c - \mu_Y)(a \, c - a \, \mu_Y - c \, \mu_Y + \mu_Y^2 + \sigma_Y^2)}{\sigma_Y^2(c-a)}
\end{align}
</math>

The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (''c''&nbsp;−&nbsp;''a''), linearly for the mean deviation and nonlinearly for the variance:

::<math>\text{(mean deviation around mean)}(Y)=</math>
::<math>(\text{(mean deviation around mean)}(X))(c-a) =\frac{2 \alpha^\alpha \beta^\beta}{\Beta(\alpha,\beta)(\alpha + \beta)^{\alpha + \beta + 1}}(c-a)</math>
::<math> \text{var}(Y) =\text{var}(X)(c-a)^2 =\frac{\alpha\beta (c-a)^2}{(\alpha+\beta)^2(\alpha+\beta+1)}.</math>

Since the [[skewness]] and [[excess kurtosis]] are non-dimensional quantities (as [[Moment (mathematics)|moments]] centered on the mean and normalized by the [[standard deviation]]), they are independent of the parameters ''a'' and ''c'', and therefore equal to the expressions given above in terms of ''X'' (with support [0,1] or (0,1)):

::<math> \text{skewness}(Y) =\text{skewness}(X) = \frac{2 (\beta - \alpha) \sqrt{\alpha + \beta + 1} }{(\alpha + \beta + 2) \sqrt{\alpha \beta}}.</math>

::<math> \text{kurtosis excess}(Y) =\text{kurtosis excess}(X)=\frac{6[(\alpha - \beta)^2 (\alpha +\beta + 1) - \alpha \beta (\alpha + \beta + 2)]}
{\alpha \beta (\alpha + \beta + 2) (\alpha + \beta + 3)} </math>