Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Beta distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Properties== ===Measures of central tendency=== ====Mode==== The [[Mode (statistics)|mode]] of a beta distributed [[random variable]] ''X'' with ''α'', ''β'' > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:<ref name=JKB>{{cite book|last1=Johnson|first1= Norman L. |first2= Samuel|last2= Kotz |first3= N. |last3= Balakrishnan| year=1995 |title=Continuous Univariate Distributions Vol. 2 |edition=2nd |publisher= Wiley |isbn= 978-0-471-58494-0 |chapter= Chapter 25: Beta Distributions}}</ref> :<math>\frac{\alpha - 1} {\alpha + \beta - 2} .</math> When both parameters are less than one (''α'', ''β'' < 1), this is the anti-mode: the lowest point of the probability density curve.<ref name=Wadsworth>{{cite book|last=Wadsworth |first=George P. and Joseph Bryan |title=Introduction to Probability and Random Variables|url=https://archive.org/details/introductiontopr0000wads |url-access=registration |year=1960|publisher=McGraw-Hill}}</ref> Letting ''α'' = ''β'', the expression for the mode simplifies to 1/2, showing that for ''α'' = ''β'' > 1 the mode (resp. anti-mode when {{nowrap|''α'', ''β'' < 1}}), is at the center of the distribution: it is symmetric in those cases. See [[Beta distribution#Shapes|Shapes]] section in this article for a full list of mode cases, for arbitrary values of ''α'' and ''β''. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of ''α'' = 2, ''β'' = 1 (or ''α'' = 1, ''β'' = 2), the density function becomes a [[Triangular distribution|right-triangle distribution]] which is finite at both ends. In several other cases there is a [[Mathematical singularity|singularity]] at one end, where the value of the density function approaches infinity. For example, in the case ''α'' = ''β'' = 1/2, the beta distribution simplifies to become the [[arcsine distribution]]. There is debate among mathematicians about some of these cases and whether the ends (''x'' = 0, and ''x'' = 1) can be called ''modes'' or not.<ref name="Handbook of Beta Distribution" /><ref name="Mathematical Statistics with MATHEMATICA">{{cite book |last1=Rose |first1=Colin |last2=Smith |first2=Murray D. |title=Mathematical Statistics with MATHEMATICA |year=2002 |publisher=Springer |isbn=978-0387952345}}</ref> [[File:Mode Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px|thumb|Mode for beta distribution for 1 ≤ ''α'' ≤ 5 and 1 ≤ β ≤ 5]] * Whether the ends are part of the [[Domain of a function|domain]] of the density function * Whether a [[Mathematical singularity|singularity]] can ever be called a ''mode'' * Whether cases with two maxima should be called ''bimodal'' ====Median==== [[File:Median Beta Distribution for alpha and beta from 0 to 5 - J. Rodal.jpg|325px|thumb|Median for beta distribution for 0 ≤ ''α'' ≤ 5 and 0 ≤ ''β'' ≤ 5]] [[File:(Mean - Median) for Beta distribution versus alpha and beta from 0 to 2 - J. Rodal.jpg|thumb|(Mean–median) for beta distribution versus alpha and beta from 0 to 2]] The median of the beta distribution is the unique real number <math>x = I_{1/2}^{[-1]}(\alpha,\beta)</math> for which the [[regularized incomplete beta function]] <math>I_x(\alpha,\beta) = \tfrac{1}{2} </math>. There is no general [[closed-form expression]] for the [[median]] of the beta distribution for arbitrary values of ''α'' and ''β''. [[Closed-form expression]]s for particular values of the parameters ''α'' and ''β'' follow:{{citation needed|date=February 2013}} * For symmetric cases ''α'' = ''β'', median = 1/2. * For ''α'' = 1 and ''β'' > 0, median <math> =1-2^{-1/\beta}</math> (this case is the [[mirror image|mirror-image]] of the power function [0,1] distribution) * For ''α'' > 0 and ''β'' = 1, median = <math>2^{-1/\alpha}</math> (this case is the power function [0,1] distribution<ref name="Handbook of Beta Distribution" />) * For ''α'' = 3 and ''β'' = 2, median = 0.6142724318676105..., the real solution to the [[Quartic function|quartic equation]] 1 − 8''x''<sup>3</sup> + 6''x''<sup>4</sup> = 0, which lies in [0,1]. * For ''α'' = 2 and ''β'' = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2)) The following are the limits with one parameter finite (non-zero) and the other approaching these limits:{{citation needed|date=February 2013}} :<math> \begin{align} \lim_{\beta \to 0} \text{median}= \lim_{\alpha \to \infty} \text{median} = 1,\\ \lim_{\alpha\to 0} \text{median}= \lim_{\beta \to \infty} \text{median} = 0. \end{align}</math> A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula<ref name=Kerman2011/> :<math>\text{median} \approx \frac{\alpha - \tfrac{1}{3}}{\alpha + \beta - \tfrac{2}{3}} \text{ for } \alpha, \beta \ge 1.</math> When ''α'', ''β'' ≥ 1, the [[relative error]] (the [[approximation error|absolute error]] divided by the median) in this approximation is less than 4% and for both ''α'' ≥ 2 and ''β'' ≥ 2 it is less than 1%. The [[approximation error|absolute error]] divided by the difference between the mean and the mode is similarly small: [[File:Relative Error for Approximation to Median of Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px|Abs[(Median-Appr.)/Median] for beta distribution for 1 ≤ ''α'' ≤ 5 and 1 ≤ ''β'' ≤ 5]][[File:Error in Median Apprx. relative to Mean-Mode distance for Beta Distribution with alpha and beta from 1 to 5 - J. Rodal.jpg|325px|Abs[(Median-Appr.)/(Mean-Mode)] for beta distribution for 1 ≤ ''α'' ≤ 5 and 1 ≤ ''β'' ≤ 5]] ====Mean==== [[File:Mean Beta Distribution for alpha and beta from 0 to 5 - J. Rodal.jpg|325px|thumb|Mean for beta distribution for {{nowrap|0 ≤ ''α'' ≤ 5}} and {{nowrap|0 ≤ ''β'' ≤ 5}}]] The [[expected value]] (mean) (''μ'') of a beta distribution [[random variable]] ''X'' with two parameters ''α'' and ''β'' is a function of only the ratio ''β''/''α'' of these parameters:<ref name=JKB /> :<math> \begin{align} \mu = \operatorname{E}[X] &= \int_0^1 x f(x;\alpha,\beta)\,dx \\ &= \int_0^1 x \,\frac{x^{\alpha-1}(1-x)^{\beta-1}}{\Beta(\alpha,\beta)}\,dx \\ &= \frac{\alpha}{\alpha + \beta} \\ &= \frac{1}{1 + \frac{\beta}{\alpha}} \end{align}</math> Letting {{nowrap|1=''α'' = ''β''}} in the above expression one obtains {{nowrap|1=''μ'' = 1/2}}, showing that for {{nowrap|1=''α'' = ''β''}} the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression: :<math> \begin{align} \lim_{\frac{\beta}{\alpha} \to 0} \mu = 1\\ \lim_{\frac{\beta}{\alpha} \to \infty} \mu = 0 \end{align}</math> Therefore, for ''β''/''α'' → 0, or for ''α''/''β'' → ∞, the mean is located at the right end, {{nowrap|1=''x'' = 1}}. For these limit ratios, the beta distribution becomes a one-point [[degenerate distribution]] with a [[Dirac delta function]] spike at the right end, {{nowrap|1=''x'' = 1}}, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, {{nowrap|1=''x'' = 1}}. Similarly, for ''β''/''α'' → ∞, or for ''α''/''β'' → 0, the mean is located at the left end, {{nowrap|1=''x'' = 0}}. The beta distribution becomes a 1-point [[Degenerate distribution]] with a [[Dirac delta function]] spike at the left end, ''x'' = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, ''x'' = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} \lim_{\beta \to 0} \mu = \lim_{\alpha \to \infty} \mu = 1\\ \lim_{\alpha\to 0} \mu = \lim_{\beta \to \infty} \mu = 0 \end{align}</math> While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(''α'', ''β'') such that {{nowrap|''α'', ''β'' > 2}}) it is known that the sample mean (as an estimate of location) is not as [[Robust statistics|robust]] as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(''α'', ''β'') such that {{nowrap|''α'', ''β'' ≤ 1}}), with the modes located at the ends of the distribution. As Mosteller and Tukey remark (<ref name=MostellerTukey>{{cite book|last=Mosteller|first=Frederick and John Tukey|title=Data Analysis and Regression: A Second Course in Statistics|url=https://archive.org/details/dataanalysisregr0000most|url-access=registration|year=1977|publisher=Addison-Wesley Pub. Co.|isbn=978-0201048544|bibcode=1977dars.book.....M}}</ref> p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(''α'', ''β'') such that {{nowrap|''α'', ''β'' ≤ 1}}) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for [[random walk]]s, since the probability for the time of the last visit to the origin in a random walk is distributed as the [[arcsine distribution]] Beta(1/2, 1/2):<ref name=Feller/><ref name=WillyFeller1>{{cite book |last=Feller |first=William |title=An Introduction to Probability Theory and Its Applications |volume=1 |edition=3rd |year=1968 |publisher=Wiley |isbn=978-0471257080}}</ref> the mean of a number of [[realization (probability)|realizations]] of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case). ====Geometric mean==== [[File:(Mean - GeometricMean) for Beta Distribution versus alpha and beta from 0 to 2 - J. Rodal.jpg|thumb|(Mean − GeometricMean) for beta distribution versus ''α'' and ''β'' from 0 to 2, showing the asymmetry between ''α'' and ''β'' for the geometric mean]] [[File:Geometric Means for Beta distribution Purple=G(X), Yellow=G(1-X), smaller values alpha and beta in front - J. Rodal.jpg|thumb|Geometric means for beta distribution Purple = ''G''(''x''), Yellow = ''G''(1 − ''x''), smaller values ''α'' and ''β'' in front]] [[File:Geometric Means for Beta distribution Purple=G(X), Yellow=G(1-X), larger values alpha and beta in front - J. Rodal.jpg|thumb|Geometric means for beta distribution. purple = ''G''(''x''), yellow = ''G''(1 − ''x''), larger values ''α'' and ''β'' in front]] The logarithm of the [[geometric mean]] ''G<sub>X</sub>'' of a distribution with [[random variable]] ''X'' is the arithmetic mean of ln(''X''), or, equivalently, its expected value: :<math>\ln G_X = \operatorname{E}[\ln X]</math> For a beta distribution, the expected value integral gives: :<math>\begin{align} \operatorname{E}[\ln X] &= \int_0^1 \ln x\, f(x;\alpha,\beta)\,dx \\[4pt] &= \int_0^1 \ln x \,\frac{ x^{\alpha-1}(1-x)^{\beta-1}}{\Beta(\alpha,\beta)}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \, \int_0^1 \frac{\partial x^{\alpha-1}(1-x)^{\beta-1}}{\partial \alpha}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \frac{\partial}{\partial \alpha} \int_0^1 x^{\alpha-1}(1-x)^{\beta-1}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \frac{\partial \Beta(\alpha,\beta)}{\partial \alpha} \\[4pt] &= \frac{\partial \ln \Beta(\alpha,\beta)}{\partial \alpha} \\[4pt] &= \frac{\partial \ln \Gamma(\alpha)}{\partial \alpha} - \frac{\partial \ln \Gamma(\alpha + \beta)}{\partial \alpha} \\[4pt] &= \psi(\alpha) - \psi(\alpha + \beta) \end{align}</math> where ''ψ'' is the [[digamma function]]. Therefore, the geometric mean of a beta distribution with shape parameters ''α'' and ''β'' is the exponential of the digamma functions of ''α'' and ''β'' as follows: :<math>G_X =e^{\operatorname{E}[\ln X]}= e^{\psi(\alpha) - \psi(\alpha + \beta)}</math> While for a beta distribution with equal shape parameters ''α'' = ''β'', it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: {{nowrap|0 < ''G''<sub>''X''</sub> < 1/2}}. The reason for this is that the logarithmic transformation strongly weights the values of ''X'' close to zero, as ln(''X'') strongly tends towards negative infinity as ''X'' approaches zero, while ln(''X'') flattens towards zero as {{nowrap|''X'' → 1}}. Along a line {{nowrap|1=''α'' = ''β''}}, the following limits apply: :<math> \begin{align} &\lim_{\alpha = \beta \to 0} G_X = 0 \\ &\lim_{\alpha = \beta \to \infty} G_X =\tfrac{1}{2} \end{align}</math> Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} \lim_{\beta \to 0} G_X = \lim_{\alpha \to \infty} G_X = 1\\ \lim_{\alpha\to 0} G_X = \lim_{\beta \to \infty} G_X = 0 \end{align}</math> The accompanying plot shows the difference between the mean and the geometric mean for shape parameters ''α'' and ''β'' from zero to 2. Besides the fact that the difference between them approaches zero as ''α'' and ''β'' approach infinity and that the difference becomes large for values of ''α'' and ''β'' approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters ''α'' and ''β''. The difference between the geometric mean and the mean is larger for small values of ''α'' in relation to ''β'' than when exchanging the magnitudes of ''β'' and ''α''. [[Norman Lloyd Johnson|N. L.Johnson]] and [[Samuel Kotz|S. Kotz]]<ref name=JKB /> suggest the logarithmic approximation to the digamma function ''ψ''(''α'') ≈ ln(''α'' − 1/2) which results in the following approximation to the geometric mean: :<math>G_X \approx \frac{\alpha \, - \frac{1}{2}}{\alpha +\beta - \frac{1}{2}}\text{ if } \alpha, \beta > 1.</math> Numerical values for the [[relative error]] in this approximation follow: [{{nowrap|1=(''α'' = ''β'' = 1): 9.39%}}]; [{{nowrap|1=(''α'' = ''β'' = 2): 1.29%}}]; [{{nowrap|1=(''α'' = 2, ''β'' = 3): 1.51%}}]; [{{nowrap|1=(''α'' = 3, ''β'' = 2): 0.44%}}]; [{{nowrap|1=(''α'' = ''β'' = 3): 0.51%}}]; [{{nowrap|1=(''α'' = ''β'' = 4): 0.26%}}]; [{{nowrap|1=(''α'' = 3, ''β'' = 4): 0.55%}}]; [{{nowrap|1=(''α'' = 4, ''β'' = 3): 0.24%}}]. Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter ''β'', what would be the value of the other parameter, ''α'', required for the geometric mean to equal 1/2?. The answer is that (for {{nowrap|''β'' > 1}}), the value of ''α'' required tends towards {{nowrap|''β'' + 1/2}} as {{nowrap|''β'' → ∞}}. For example, all these couples have the same geometric mean of 1/2: [{{nowrap|1=''β'' = 1, ''α'' = 1.4427}}], [{{nowrap|1=''β'' = 2, ''α'' = 2.46958}}], [{{nowrap|1=''β'' = 3, ''α'' = 3.47943}}], [{{nowrap|1=''β'' = 4, ''α'' = 4.48449}}], [{{nowrap|1=''β'' = 5, ''α'' = 5.48756}}], [{{nowrap|1=''β'' = 10, ''α'' = 10.4938}}], [{{nowrap|1=''β'' = 100, ''α'' = 100.499}}]. The fundamental property of the geometric mean, which can be proven to be false for any other mean, is :<math>G\left(\frac{X_i}{Y_i}\right) = \frac{G(X_i)}{G(Y_i)}</math> This makes the geometric mean the only correct mean when averaging ''normalized'' results, that is results that are presented as ratios to reference values.<ref>Philip J. Fleming and John J. Wallace. ''How not to lie with statistics: the correct way to summarize benchmark results''. Communications of the ACM, 29(3):218–221, March 1986.</ref> This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the [[geometric mean]] ''G<sub>X</sub>'' based on the random variable X, also another geometric mean appears naturally: the [[geometric mean]] based on the linear transformation ––{{nowrap|(1 − ''X'')}}, the mirror-image of ''X'', denoted by ''G''<sub>(1−''X'')</sub>: :<math>G_{(1-X)} = e^{\operatorname{E}[\ln(1-X)] } = e^{\psi(\beta) - \psi(\alpha + \beta)}</math> Along a line {{nowrap|1=''α'' = ''β''}}, the following limits apply: :<math> \begin{align} &\lim_{\alpha = \beta \to 0} G_{(1-X)} =0 \\ &\lim_{\alpha = \beta \to \infty} G_{(1-X)} =\tfrac{1}{2} \end{align}</math> Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} \lim_{\beta \to 0} G_{(1-X)} = \lim_{\alpha \to \infty} G_{(1-X)} = 0\\ \lim_{\alpha\to 0} G_{(1-X)} = \lim_{\beta \to \infty} G_{(1-X)} = 1 \end{align}</math> It has the following approximate value: :<math>G_{(1-X)} \approx \frac{\beta - \frac{1}{2}}{\alpha+\beta-\frac{1}{2}}\text{ if } \alpha, \beta > 1.</math> Although both ''G''<sub>''X''</sub> and ''G''<sub>(1−''X'')</sub> are asymmetric, in the case that both shape parameters are equal {{nowrap|1=''α'' = ''β''}}, the geometric means are equal: ''G''<sub>''X''</sub> = ''G''<sub>(1−''X'')</sub>. This equality follows from the following symmetry displayed between both geometric means: :<math>G_X (\Beta(\alpha, \beta) )=G_{(1-X)}(\Beta(\beta, \alpha) ). </math> ====Harmonic mean==== [[File:Harmonic mean for Beta distribution for alpha and beta ranging from 0 to 5 - J. Rodal.jpg|thumb|Harmonic mean for beta distribution for 0 < ''α'' < 5 and 0 < ''β'' < 5]] [[File:(Mean - HarmonicMean) for Beta distribution versus alpha and beta from 0 to 2 - J. Rodal.jpg|thumb|Harmonic mean for beta distribution versus ''α'' and ''β'' from 0 to 2]] [[File:Harmonic Means for Beta distribution Purple=H(X), Yellow=H(1-X), smaller values alpha and beta in front - J. Rodal.jpg|thumb|Harmonic means for beta distribution Purple = ''H''(''X''), Yellow = ''H''(1 − ''X''), smaller values ''α'' and ''β'' in front]] [[File:Harmonic Means for Beta distribution Purple=H(X), Yellow=H(1-X), larger values alpha and beta in front - J. Rodal.jpg|thumb|Harmonic means for beta distribution: purple = ''H''(''X''), yellow = ''H''(1 − ''X''), larger values ''α'' and ''β'' in front]] The inverse of the [[harmonic mean]] (''H<sub>X</sub>'') of a distribution with [[random variable]] ''X'' is the arithmetic mean of 1/''X'', or, equivalently, its expected value. Therefore, the [[harmonic mean]] (''H<sub>X</sub>'') of a beta distribution with shape parameters ''α'' and ''β'' is: :<math> \begin{align} H_X &= \frac{1}{\operatorname{E}\left[\frac{1}{X}\right]} \\ &=\frac{1}{\int_0^1 \frac{f(x;\alpha,\beta)}{x}\,dx} \\ &=\frac{1}{\int_0^1 \frac{x^{\alpha-1}(1-x)^{\beta-1}}{x \Beta(\alpha,\beta)}\,dx} \\ &= \frac{\alpha - 1}{\alpha + \beta - 1}\text{ if } \alpha > 1 \text{ and } \beta > 0 \\ \end{align}</math> The [[harmonic mean]] (''H<sub>X</sub>'') of a beta distribution with ''α'' < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter ''α'' less than unity. Letting ''α'' = ''β'' in the above expression one obtains :<math>H_X = \frac{\alpha-1}{2\alpha-1},</math> showing that for ''α'' = ''β'' the harmonic mean ranges from 0, for ''α'' = ''β'' = 1, to 1/2, for ''α'' = ''β'' → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} &\lim_{\alpha\to 0} H_X \text{ is undefined} \\ &\lim_{\alpha\to 1} H_X = \lim_{\beta \to \infty} H_X = 0 \\ &\lim_{\beta \to 0} H_X = \lim_{\alpha \to \infty} H_X = 1 \end{align}</math> The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean ''H<sub>X</sub>'' based on the random variable ''X'', also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − ''X''), the mirror-image of ''X'', denoted by ''H''<sub>1 − ''X''</sub>: :<math>H_{1-X} = \frac{1}{\operatorname{E} \left[\frac 1 {1-X}\right]} = \frac{\beta - 1}{\alpha + \beta-1} \text{ if } \beta > 1, \text{ and } \alpha> 0. </math> The [[harmonic mean]] (''H''<sub>(1 − ''X'')</sub>) of a beta distribution with ''β'' < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter ''β'' less than unity. Letting ''α'' = ''β'' in the above expression one obtains :<math>H_{(1-X)} = \frac{\beta-1}{2\beta-1},</math> showing that for ''α'' = ''β'' the harmonic mean ranges from 0, for ''α'' = ''β'' = 1, to 1/2, for ''α'' = ''β'' → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} &\lim_{\beta\to 0} H_{1-X} \text{ is undefined} \\ &\lim_{\beta\to 1} H_{1-X} = \lim_{\alpha\to \infty} H_{1-X} = 0 \\ &\lim_{\alpha\to 0} H_{1-X} = \lim_{\beta\to \infty} H_{1-X} = 1 \end{align}</math> Although both ''H''<sub>''X''</sub> and ''H''<sub>1−''X''</sub> are asymmetric, in the case that both shape parameters are equal ''α'' = ''β'', the harmonic means are equal: ''H''<sub>''X''</sub> = ''H''<sub>1−''X''</sub>. This equality follows from the following symmetry displayed between both harmonic means: :<math>H_X (\Beta(\alpha, \beta) )=H_{1-X}(\Beta(\beta, \alpha) ) \text{ if } \alpha, \beta> 1.</math> ===Measures of statistical dispersion=== ====Variance==== The [[variance]] (the second moment centered on the mean) of a beta distribution [[random variable]] ''X'' with parameters ''α'' and ''β'' is:<ref name=JKB /><ref>{{cite web | url = http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm | title = NIST/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution | website = [[National Institute of Standards and Technology]] Information Technology Laboratory | access-date = May 31, 2016 |date = April 2012 }}</ref> :<math>\operatorname{var}(X) = \operatorname{E}[(X - \mu)^2] = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}</math> Letting ''α'' = ''β'' in the above expression one obtains :<math>\operatorname{var}(X) = \frac{1}{4(2\beta + 1)},</math> showing that for ''α'' = ''β'' the variance decreases monotonically as {{nowrap|1=''α'' = ''β''}} increases. Setting {{nowrap|1=''α'' = ''β'' = 0}} in this expression, one finds the maximum variance var(''X'') = 1/4<ref name=JKB /> which only occurs approaching the limit, at {{nowrap|1=''α'' = ''β'' = 0}}. The beta distribution may also be [[Statistical parameter|parametrized]] in terms of its mean ''μ'' {{nowrap|1=(0 < ''μ'' < 1)}} and sample size {{nowrap|1=''ν'' = ''α'' + ''β''}} ({{nowrap|''ν'' > 0}}) (see subsection [[#Mean and sample size|Mean and sample size]]): :<math> \begin{align} \alpha &= \mu \nu, \text{ where }\nu =(\alpha + \beta) >0\\ \beta &= (1 - \mu) \nu, \text{ where }\nu =(\alpha + \beta) >0. \end{align}</math> Using this [[Statistical parameter|parametrization]], one can express the variance in terms of the mean ''μ'' and the sample size ''ν'' as follows: :<math>\operatorname{var}(X) = \frac{\mu (1-\mu)}{1 + \nu}</math> Since {{nowrap|1=''ν'' = ''α'' + ''β'' > 0}}, it follows that {{nowrap|var(''X'') < ''μ''(1 − ''μ'')}}. For a symmetric distribution, the mean is at the middle of the distribution, {{nowrap|1=''μ'' = 1/2 }}, and therefore: :<math>\operatorname{var}(X) = \frac{1}{4 (1 + \nu)} \text{ if } \mu = \tfrac{1}{2}</math> Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: :<math> \begin{align} &\lim_{\beta\to 0} \operatorname{var}(X) =\lim_{\alpha \to 0} \operatorname{var}(X) =\lim_{\beta\to \infty} \operatorname{var}(X) =\lim_{\alpha \to \infty} \operatorname{var}(X) = \lim_{\nu \to \infty} \operatorname{var}(X) =\lim_{\mu \to 0} \operatorname{var}(X) =\lim_{\mu \to 1} \operatorname{var}(X) = 0\\ &\lim_{\nu \to 0} \operatorname{var}(X) = \mu (1-\mu) \end{align}</math> [[File:Variance for Beta Distribution for alpha and beta ranging from 0 to 5 - J. Rodal.jpg|325px]] ====Geometric variance and covariance==== [[File:Beta distribution log geometric variances front view - J. Rodal.png|thumb|log geometric variances vs. ''α'' and ''β'']] [[File:Beta distribution log geometric variances back view - J. Rodal.png|thumb|log geometric variances vs. ''α'' and ''β'']] The logarithm of the geometric variance, ln(var<sub>''GX''</sub>), of a distribution with [[random variable]] ''X'' is the second moment of the logarithm of ''X'' centered on the geometric mean of ''X'', ln(''G<sub>X</sub>''): :<math>\begin{align} \ln \operatorname{var}_{GX} &= \operatorname{E} \left [(\ln X - \ln G_X)^2 \right ] \\ &= \operatorname{E}[(\ln X - \operatorname{E}\left [\ln X])^2 \right] \\ &= \operatorname{E}\left[(\ln X)^2 \right] - (\operatorname{E}[\ln X])^2\\ &= \operatorname{var}[\ln X] \end{align}</math> and therefore, the geometric variance is: :<math>\operatorname{var}_{GX} = e^{\operatorname{var}[\ln X]}</math> In the [[Fisher information]] matrix, and the curvature of the log [[likelihood function]], the logarithm of the geometric variance of the [[reflection formula|reflected]] variable 1 − ''X'' and the logarithm of the geometric covariance between ''X'' and 1 − ''X'' appear: :<math>\begin{align} \ln \operatorname{var_{G(1-X)}} &= \operatorname{E}[(\ln (1-X) - \ln G_{1-X})^2] \\ &= \operatorname{E}[(\ln (1-X) - \operatorname{E}[\ln (1-X)])^2] \\ &= \operatorname{E}[(\ln (1-X))^2] - (\operatorname{E}[\ln (1-X)])^2\\ &= \operatorname{var}[\ln (1-X)] \\ & \\ \operatorname{var_{G(1-X)}} &= e^{\operatorname{var}[\ln (1-X)]} \\ & \\ \ln \operatorname{cov_{G{X,1-X}}} &= \operatorname{E}[(\ln X - \ln G_X)(\ln (1-X) - \ln G_{1-X})] \\ &= \operatorname{E}[(\ln X - \operatorname{E}[\ln X])(\ln (1-X) - \operatorname{E}[\ln (1-X)])] \\ &= \operatorname{E}\left[\ln X \ln(1-X)\right] - \operatorname{E}[\ln X]\operatorname{E}[\ln(1-X)]\\ &= \operatorname{cov}[\ln X, \ln(1-X)] \\ & \\ \operatorname{cov}_{G{X,(1-X)}} &= e^{\operatorname{cov}[\ln X, \ln(1-X)]} \end{align}</math> For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section {{section link||Moments of logarithmically transformed random variables}}. The [[variance]] of the logarithmic variables and [[covariance]] of ln ''X'' and ln(1−''X'') are: : <math>\operatorname{var}[\ln X]= \psi_1(\alpha) - \psi_1(\alpha + \beta)</math> : <math>\operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta)</math> : <math>\operatorname{cov}[\ln X, \ln(1-X)] = -\psi_1(\alpha+\beta)</math> where the '''[[trigamma function]]''', denoted ''ψ''<sub>1</sub>(''α''), is the second of the [[polygamma function]]s, and is defined as the derivative of the [[digamma function]]: :<math>\psi_1(\alpha) = \frac{d^2\ln\Gamma(\alpha)}{d\alpha^2}= \frac{d \, \psi(\alpha)}{d\alpha}.</math> Therefore, :<math> \ln \operatorname{var}_{GX}=\operatorname{var}[\ln X]= \psi_1(\alpha) - \psi_1(\alpha + \beta) </math> :<math> \ln \operatorname{var}_{G(1-X)} =\operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta)</math> :<math> \ln \operatorname{cov}_{GX,1-X} =\operatorname{cov}[\ln X, \ln(1-X)] = -\psi_1(\alpha+\beta)</math> The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters ''α'' and ''β''. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters ''α'' and ''β'' greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values ''α'' and ''β'' less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for ''α'' and ''β'' less than unity. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :<math> \begin{align} &\lim_{\alpha\to 0} \ln \operatorname{var}_{GX} = \lim_{\beta\to 0} \ln \operatorname{var}_{G(1-X)} =\infty \\ &\lim_{\beta \to 0} \ln \operatorname{var}_{GX} = \lim_{\alpha \to \infty} \ln \operatorname{var}_{GX} = \lim_{\alpha \to 0} \ln \operatorname{var}_{G(1-X)} = \lim_{\beta\to \infty} \ln \operatorname{var}_{G(1-X)} = \lim_{\alpha\to \infty} \ln \operatorname{cov}_{GX,(1-X)} = \lim_{\beta\to \infty} \ln \operatorname{cov}_{GX,(1-X)} = 0\\ &\lim_{\beta \to \infty} \ln \operatorname{var}_{GX} = \psi_1(\alpha)\\ &\lim_{\alpha\to \infty} \ln \operatorname{var}_{G(1-X)} = \psi_1(\beta)\\ &\lim_{\alpha\to 0} \ln \operatorname{cov}_{GX,(1-X)} = - \psi_1(\beta)\\ &\lim_{\beta\to 0} \ln \operatorname{cov}_{GX,(1-X)} = - \psi_1(\alpha) \end{align}</math> Limits with two parameters varying: :<math> \begin{align} &\lim_{\alpha\to \infty}( \lim_{\beta \to \infty} \ln \operatorname{var}_{GX}) = \lim_{\beta \to \infty}( \lim_{\alpha\to \infty} \ln \operatorname{var}_{G(1-X)}) = \lim_{\alpha\to \infty} (\lim_{\beta \to 0} \ln \operatorname{cov}_{GX,(1-X)}) = \lim_{\beta\to \infty}( \lim_{\alpha\to 0} \ln \operatorname{cov}_{GX,(1-X)}) =0\\ &\lim_{\alpha\to \infty} (\lim_{\beta \to 0} \ln \operatorname{var}_{GX}) = \lim_{\beta\to \infty} (\lim_{\alpha\to 0} \ln \operatorname{var}_{G(1-X)}) = \infty\\ &\lim_{\alpha\to 0} (\lim_{\beta \to 0} \ln \operatorname{cov}_{GX,(1-X)}) = \lim_{\beta\to 0} (\lim_{\alpha\to 0} \ln \operatorname{cov}_{GX,(1-X)}) = - \infty \end{align}</math> Although both ln(var<sub>''GX''</sub>) and ln(var<sub>''G''(1 − ''X'')</sub>) are asymmetric, when the shape parameters are equal, α = β, one has: ln(var<sub>''GX''</sub>) = ln(var<sub>''G(1−X)''</sub>). This equality follows from the following symmetry displayed between both log geometric variances: :<math>\ln \operatorname{var}_{GX}(\Beta(\alpha, \beta))=\ln \operatorname{var}_{G(1-X)}(\Beta(\beta, \alpha)).</math> The log geometric covariance is symmetric: :<math>\ln \operatorname{cov}_{GX,(1-X)}(\Beta(\alpha, \beta) )=\ln \operatorname{cov}_{GX,(1-X)}(\Beta(\beta, \alpha))</math> ====Mean absolute deviation around the mean==== [[File:Ratio of Mean Abs. Dev. to Std.Dev. Beta distribution with alpha and beta from 0 to 5 - J. Rodal.jpg|thumb|Ratio of ,ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5]] [[File:Ratio of Mean Abs. Dev. to Std.Dev. Beta distribution vs. nu from 0 to 10 and vs. mean - J. Rodal.jpg|thumb|Ratio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ ''μ'' ≤ 1 and sample size 0 < ''ν'' ≤ 10]] The [[mean absolute deviation]] around the mean for the beta distribution with shape parameters ''α'' and ''β'' is:<ref name="Handbook of Beta Distribution" /> :<math>\operatorname{E}[|X - E[X]|] = \frac{2 \alpha^\alpha \beta^\beta}{\Beta(\alpha,\beta)(\alpha + \beta)^{\alpha + \beta + 1}} </math> The mean absolute deviation around the mean is a more [[Robust statistics|robust]] [[estimator]] of [[statistical dispersion]] than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(''α'', ''β'') distributions with ''α'',''β'' > 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted. Using [[Stirling's approximation]] to the [[Gamma function]], [[Norman Lloyd Johnson|N.L.Johnson]] and [[Samuel Kotz|S.Kotz]]<ref name=JKB /> derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for ''α'' = ''β'' = 1, and it decreases to zero as ''α'' → ∞, ''β'' → ∞): :<math> \begin{align} \frac{\text{mean abs. dev. from mean}}{\text{standard deviation}} &=\frac{\operatorname{E}[|X - E[X]|]}{\sqrt{\operatorname{var}(X)}}\\ &\approx \sqrt{\frac{2}{\pi}} \left(1+\frac{7}{12 (\alpha+\beta)}{}-\frac{1}{12 \alpha}-\frac{1}{12 \beta} \right), \text{ if } \alpha, \beta > 1. \end{align}</math> At the limit ''α'' → ∞, ''β'' → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: <math>\sqrt{\frac{2}{\pi}}</math>. For ''α'' = ''β'' = 1 this ratio equals <math>\frac{\sqrt{3}}{2}</math>, so that from ''α'' = ''β'' = 1 to ''α'', ''β'' → ∞ the ratio decreases by 8.5%. For ''α'' = ''β'' = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from ''α'' = ''β'' = 0 to ''α'' = ''β'' = 1, and by 25% from ''α'' = ''β'' = 0 to ''α'', ''β'' → ∞ . However, for skewed beta distributions such that ''α'' → 0 or ''β'' → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation. Using the [[Statistical parameter|parametrization]] in terms of mean ''μ'' and sample size ''ν'' = ''α'' + ''β'' > 0: :''α'' = ''μν'', ''β'' = (1 − ''μ'')''ν'' one can express the mean [[absolute deviation]] around the mean in terms of the mean ''μ'' and the sample size ''ν'' as follows: :<math>\operatorname{E}[| X - E[X]|] = \frac{2 \mu^{\mu\nu} (1-\mu)^{(1-\mu)\nu}}{\nu \Beta(\mu \nu,(1-\mu)\nu)}</math> For a symmetric distribution, the mean is at the middle of the distribution, ''μ'' = 1/2, and therefore: :<math> \begin{align} \operatorname{E}[|X - E[X]|] = \frac{2^{1-\nu}}{\nu \Beta(\tfrac{\nu}{2} ,\tfrac{\nu}{2})} &= \frac{2^{1-\nu}\Gamma(\nu)}{\nu (\Gamma(\tfrac{\nu}{2}))^2 } \\ \lim_{\nu \to 0} \left (\lim_{\mu \to \frac{1}{2}} \operatorname{E}[|X - E[X]|] \right ) &= \tfrac{1}{2}\\ \lim_{\nu \to \infty} \left (\lim_{\mu \to \frac{1}{2}} \operatorname{E}[| X - E[X]|] \right ) &= 0 \end{align}</math> Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: :<math> \begin{align} \lim_{\beta\to 0} \operatorname{E}[|X - E[X]|] &=\lim_{\alpha \to 0} \operatorname{E}[|X - E[X]|]= 0 \\ \lim_{\beta\to \infty} \operatorname{E}[|X - E[X]|] &=\lim_{\alpha \to \infty} \operatorname{E}[|X - E[X]|] = 0\\ \lim_{\mu \to 0} \operatorname{E}[|X - E[X]|]&=\lim_{\mu \to 1} \operatorname{E}[|X - E[X]|] = 0\\ \lim_{\nu \to 0} \operatorname{E}[|X - E[X]|] &= \sqrt{\mu (1-\mu)} \\ \lim_{\nu \to \infty} \operatorname{E}[|X - E[X]|] &= 0 \end{align}</math> ====Mean absolute difference==== The [[mean absolute difference]] for the beta distribution is: :<math>\mathrm{MD} = \int_0^1 \int_0^1 f(x;\alpha,\beta)\,f(y;\alpha,\beta)\,|x-y|\,dx\,dy = \left(\frac{4}{\alpha+\beta}\right)\frac{B(\alpha+\beta,\alpha+\beta)}{B(\alpha,\alpha)B(\beta,\beta)}</math> The [[Gini coefficient]] for the beta distribution is half of the relative mean absolute difference: :<math>\mathrm{G} = \left(\frac{2}{\alpha}\right)\frac{B(\alpha+\beta,\alpha+\beta)}{B(\alpha,\alpha)B(\beta,\beta)}</math> ===Skewness=== [[File:Skewness for Beta Distribution as a function of the variance and the mean - J. Rodal.jpg|325px|thumb|Skewness for beta distribution as a function of variance and mean]] The [[skewness]] (the third moment centered on the mean, normalized by the 3/2 power of the variance) of the beta distribution is<ref name=JKB /> :<math>\gamma_1 =\frac{\operatorname{E}[(X - \mu)^3]}{(\operatorname{var}(X))^{3/2}} = \frac{2(\beta - \alpha)\sqrt{\alpha + \beta + 1}}{(\alpha + \beta + 2) \sqrt{\alpha \beta}} .</math> Letting ''α'' = ''β'' in the above expression one obtains ''γ''<sub>1</sub> = 0, showing once again that for ''α'' = ''β'' the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for ''α'' < ''β'', negative skew (left-tailed) for ''α'' > ''β''. Using the [[Statistical parameter|parametrization]] in terms of mean ''μ'' and sample size ''ν'' = ''α'' + ''β'': :<math> \begin{align} \alpha & {} = \mu \nu ,\text{ where }\nu =(\alpha + \beta) >0\\ \beta & {} = (1 - \mu) \nu , \text{ where }\nu =(\alpha + \beta) >0. \end{align}</math> one can express the skewness in terms of the mean ''μ'' and the sample size ν as follows: :<math>\gamma_1 =\frac{\operatorname{E}[(X - \mu)^3]}{(\operatorname{var}(X))^{3/2}} = \frac{2(1-2\mu)\sqrt{1+\nu}}{(2+\nu)\sqrt{\mu (1 - \mu)}}.</math> The skewness can also be expressed just in terms of the variance ''var'' and the mean ''μ'' as follows: :<math>\gamma_1 =\frac{\operatorname{E}[(X - \mu)^3]}{(\operatorname{var}(X))^{3/2}} = \frac{2(1-2\mu)\sqrt{\operatorname{var}}}{ \mu(1-\mu) + \operatorname{var}}\text{ if } \operatorname{var} < \mu(1-\mu)</math> The accompanying plot of skewness as a function of variance and mean shows that maximum variance (1/4) is coupled with zero skewness and the symmetry condition (''μ'' = 1/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the "mass" of the probability distribution is concentrated at the ends (minimum variance). The following expression for the square of the skewness, in terms of the sample size ''ν'' = ''α'' + ''β'' and the variance var, is useful for the method of moments estimation of four parameters: :<math>(\gamma_1)^2 =\frac{(\operatorname{E}[(X - \mu)^3])^2}{(\operatorname{var}(X))^3} = \frac{4}{(2+\nu)^2}\bigg(\frac{1}{\operatorname{var}}-4(1+\nu)\bigg)</math> This expression correctly gives a skewness of zero for ''α'' = ''β'', since in that case (see {{section link||Variance}}): <math>\operatorname{var} = \frac{1}{4 (1 + \nu)}</math>. For the symmetric case (''α'' = ''β''), skewness = 0 over the whole range, and the following limits apply: :<math>\lim_{\alpha = \beta \to 0} \gamma_1 = \lim_{\alpha = \beta \to \infty} \gamma_1 =\lim_{\nu \to 0} \gamma_1=\lim_{\nu \to \infty} \gamma_1=\lim_{\mu \to \frac{1}{2}} \gamma_1 = 0</math> For the asymmetric cases (''α'' ≠ ''β'') the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: :<math> \begin{align} &\lim_{\alpha\to 0} \gamma_1 =\lim_{\mu\to 0} \gamma_1 = \infty\\ &\lim_{\beta \to 0} \gamma_1 = \lim_{\mu\to 1} \gamma_1= - \infty\\ &\lim_{\alpha\to \infty} \gamma_1 = -\frac{2}{\sqrt\beta},\quad \lim_{\beta \to 0}(\lim_{\alpha\to \infty} \gamma_1) = -\infty,\quad \lim_{\beta \to \infty}(\lim_{\alpha\to \infty} \gamma_1) = 0\\ &\lim_{\beta\to \infty} \gamma_1 = \frac{2}{\sqrt\alpha},\quad \lim_{\alpha \to 0}(\lim_{\beta \to \infty} \gamma_1) = \infty,\quad \lim_{\alpha \to \infty}(\lim_{\beta \to \infty} \gamma_1) = 0\\ &\lim_{\nu \to 0} \gamma_1 = \frac{1 - 2 \mu}{\sqrt{\mu (1-\mu)}},\quad \lim_{\mu \to 0}(\lim_{\nu \to 0} \gamma_1) = \infty,\quad \lim_{\mu \to 1}(\lim_{\nu \to 0} \gamma_1) = - \infty \end{align}</math> [[File:Skewness Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px]] [[File:Skewness Beta Distribution for alpha and beta from .1 to 5 - J. Rodal.jpg|325px]] ===Kurtosis=== [[File:Excess Kurtosis for Beta Distribution as a function of variance and mean - J. Rodal.jpg|325px|thumb|Excess Kurtosis for Beta Distribution as a function of variance and mean]] The beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.<ref name=Oguamanam>{{cite journal |last1=Oguamanam |first1=D.C.D. |last2=Martin |first2=H. R. |last3=Huissoon |first3=J. P. |title=On the application of the beta distribution to gear damage analysis |journal=Applied Acoustics |year=1995 |volume=45 |issue=3 |pages=247–261 |doi=10.1016/0003-682X(95)00001-P}}</ref> Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.<ref name=Liang>{{cite journal|author1=Zhiqiang Liang |author2=Jianming Wei |author3=Junyu Zhao |author4=Haitao Liu |author5=Baoqing Li |author6=Jie Shen |author7=Chunlei Zheng |title=The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals |journal=Sensors |date=27 August 2008 |volume=8 |issue=8 |pages=5106–5119 |doi=10.3390/s8085106|pmid=27873804 |pmc=3705491 |bibcode=2008Senso...8.5106L |doi-access=free }}</ref> Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping<ref name="Kenney and Keeping">{{cite book|last=Kenney|first=J. F., and E. S. Keeping|title=Mathematics of Statistics Part Two, 2nd edition|year=1951|publisher=D. Van Nostrand Company Inc.}}</ref> use the symbol γ<sub>2</sub> for the [[excess kurtosis]], but [[Abramowitz and Stegun]]<ref name=Abramowitz>{{cite book|last=Abramowitz|first=Milton and Irene A. Stegun|title=Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables|year=1965|publisher=Dover|isbn=978-0-486-61272-0|url=https://archive.org/details/handbookofmathe000abra}}</ref> use different terminology. To prevent confusion<ref name=Weisstein.Kurtosi>{{cite web|last=Weisstein.|first=Eric W.|title=Kurtosis|url=http://mathworld.wolfram.com/Kurtosis.html|publisher=MathWorld--A Wolfram Web Resource|access-date=13 August 2012}}</ref> between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:<ref name="Handbook of Beta Distribution">{{cite book|editor-last=Gupta|editor-first=Arjun K.|title=Handbook of Beta Distribution and Its Applications|year=2004|publisher=CRC Press|isbn=978-0824753962}}</ref><ref name=Panik>{{cite book|last=Panik|first=Michael J|title=Advanced Statistics from an Elementary Point of View|year=2005|publisher=Academic Press|isbn=978-0120884940}}</ref> :<math>\begin{align} \text{excess kurtosis} &=\text{kurtosis} - 3\\ &=\frac{\operatorname{E}[(X - \mu)^4]}{{(\operatorname{var}(X))^{2}}}-3\\ &=\frac{6[\alpha^3-\alpha^2(2\beta - 1) + \beta^2(\beta + 1) - 2\alpha\beta(\beta + 2)]}{\alpha \beta (\alpha + \beta + 2)(\alpha + \beta + 3)}\\ &=\frac{6[(\alpha - \beta)^2 (\alpha +\beta + 1) - \alpha \beta (\alpha + \beta + 2)]} {\alpha \beta (\alpha + \beta + 2) (\alpha + \beta + 3)} . \end{align}</math> Letting ''α'' = ''β'' in the above expression one obtains :<math>\text{excess kurtosis} =- \frac{6}{3+2\alpha} \text{ if }\alpha=\beta </math>. Therefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {''α'' = ''β''} → 0, and approaching a maximum value of zero as {''α'' = ''β''} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end ''x'' = 0 and ''x'' = 1, with nothing in between: a 2-point [[Bernoulli distribution]] with equal probability 1/2 at each end (a coin toss: see section below "Kurtosis bounded by the square of the skewness" for further discussion). The description of [[kurtosis]] as a measure of the "potential outliers" (or "potential rare, extreme values") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For ''α'' ≠ ''β'', skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for ''α'' → 0 for finite ''β'', or for ''β'' → 0 for finite ''α'') because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends. Using the [[Statistical parameter|parametrization]] in terms of mean ''μ'' and sample size ''ν'' = ''α'' + ''β'': :<math> \begin{align} \alpha & {} = \mu \nu ,\text{ where }\nu =(\alpha + \beta) >0\\ \beta & {} = (1 - \mu) \nu , \text{ where }\nu =(\alpha + \beta) >0. \end{align}</math> one can express the excess kurtosis in terms of the mean ''μ'' and the sample size ''ν'' as follows: :<math>\text{excess kurtosis} =\frac{6}{3 + \nu}\bigg (\frac{(1 - 2 \mu)^2 (1 + \nu)}{\mu (1 - \mu) (2 + \nu)} - 1 \bigg )</math> The excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size ''ν'' as follows: :<math>\text{excess kurtosis} =\frac{6}{(3 + \nu)(2 + \nu)}\left(\frac{1}{\text{ var }} - 6 - 5 \nu \right)\text{ if }\text{var}< \mu(1-\mu)</math> and, in terms of the variance ''var'' and the mean ''μ'' as follows: :<math>\text{excess kurtosis} =\frac{6 \text{ var } (1 - \text{ var } - 5 \mu (1 - \mu) )}{(\text{var } + \mu (1 - \mu))(2\text{ var } + \mu (1 - \mu) )}\text{ if }\text{var}< \mu(1-\mu)</math> The plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1/4) and the symmetry condition: the mean occurring at the midpoint (''μ'' = 1/2). This occurs for the symmetric case of ''α'' = ''β'' = 0, with zero skewness. At the limit, this is the 2 point [[Bernoulli distribution]] with equal probability 1/2 at each [[Dirac delta function]] end ''x'' = 0 and ''x'' = 1 and zero probability everywhere else. (A coin toss: one face of the coin being ''x'' = 0 and the other face being ''x'' = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density "mass" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-"peaky" with nothing in between them. On the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (''μ'' = 0 or ''μ'' = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end. Alternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows: :<math>\text{excess kurtosis} =\frac{6}{3 + \nu}\bigg(\frac{(2 + \nu)}{4} (\text{skewness})^2 - 1\bigg)\text{ if (skewness)}^2-2< \text{excess kurtosis}< \frac{3}{2} (\text{skewness})^2</math> From this last expression, one can obtain the same limits published over a century ago by [[Karl Pearson]]<ref name=Pearson /> for the beta distribution (see section below titled "Kurtosis bounded by the square of the skewness"). Setting ''α'' + ''β'' = ''ν'' = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness<sup>2</sup> = 0) cannot occur for any distribution, and hence [[Karl Pearson]] appropriately called the region below this boundary the "impossible region"). The limit of ''α'' + ''β'' = ''ν'' → ∞ determines Pearson's upper boundary. :<math> \begin{align} &\lim_{\nu \to 0}\text{excess kurtosis} = (\text{skewness})^2 - 2\\ &\lim_{\nu \to \infty}\text{excess kurtosis} = \tfrac{3}{2} (\text{skewness})^2 \end{align}</math> therefore: :<math>(\text{skewness})^2-2< \text{excess kurtosis}< \tfrac{3}{2} (\text{skewness})^2</math> Values of ''ν'' = ''α'' + ''β'' such that ''ν'' ranges from zero to infinity, 0 < ''ν'' < ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness. For the symmetric case (''α'' = ''β''), the following limits apply: :<math> \begin{align} &\lim_{\alpha = \beta \to 0} \text{excess kurtosis} = - 2 \\ &\lim_{\alpha = \beta \to \infty} \text{excess kurtosis} = 0 \\ &\lim_{\mu \to \frac{1}{2}} \text{excess kurtosis} = - \frac{6}{3 + \nu} \end{align}</math> For the unsymmetric cases (''α'' ≠ ''β'') the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: :<math> \begin{align} &\lim_{\alpha\to 0}\text{excess kurtosis} =\lim_{\beta \to 0} \text{excess kurtosis} = \lim_{\mu \to 0}\text{excess kurtosis} = \lim_{\mu \to 1}\text{excess kurtosis} =\infty\\ &\lim_{\alpha \to \infty}\text{excess kurtosis} = \frac{6}{\beta},\text{ } \lim_{\beta \to 0}(\lim_{\alpha\to \infty} \text{excess kurtosis}) = \infty,\text{ } \lim_{\beta \to \infty}(\lim_{\alpha\to \infty} \text{excess kurtosis}) = 0\\ &\lim_{\beta \to \infty}\text{excess kurtosis} = \frac{6}{\alpha},\text{ } \lim_{\alpha \to 0}(\lim_{\beta \to \infty} \text{excess kurtosis}) = \infty,\text{ } \lim_{\alpha \to \infty}(\lim_{\beta \to \infty} \text{excess kurtosis}) = 0\\ &\lim_{\nu \to 0} \text{excess kurtosis} = - 6 + \frac{1}{\mu (1 - \mu)},\text{ } \lim_{\mu \to 0}(\lim_{\nu \to 0} \text{excess kurtosis}) = \infty,\text{ } \lim_{\mu \to 1}(\lim_{\nu \to 0} \text{excess kurtosis}) = \infty \end{align}</math> [[File:Excess Kurtosis for Beta Distribution with alpha and beta ranging from 1 to 5 - J. Rodal.jpg|325px]][[File:Excess Kurtosis for Beta Distribution with alpha and beta ranging from 0.1 to 5 - J. Rodal.jpg|325px]] ===Characteristic function=== [[File:Re(CharacteristicFunction) Beta Distr alpha=beta from 0 to 25 Back - J. Rodal.jpg|325px|thumb|[[Characteristic function (probability theory)|Re(characteristic function)]] symmetric case ''α'' = ''β'' ranging from 25 to 0]][[File:Re(CharacteristicFunc) Beta Distr alpha=beta from 0 to 25 Front- J. Rodal.jpg|325px|thumb|[[Characteristic function (probability theory)|Re(characteristic function)]] symmetric case ''α'' = ''β'' ranging from 0 to 25]][[File:Re(CharacteristFunc) Beta Distr alpha from 0 to 25 and beta=alpha+0.5 Back - J. Rodal.jpg|325px|thumb|[[Characteristic function (probability theory)|Re(characteristic function)]] ''β'' = ''α'' + 1/2; ''α'' ranging from 25 to 0]][[File:Re(CharacterFunc) Beta Distrib. beta from 0 to 25, alpha=beta+0.5 Back - J. Rodal.jpg|325px|thumb|[[Characteristic function (probability theory)|Re(characteristic function)]] ''α'' = ''β'' + 1/2; ''β'' ranging from 25 to 0]][[File:Re(CharacterFunc) Beta Distr. beta from 0 to 25, alpha=beta+0.5 Front - J. Rodal.jpg|325px|thumb|[[Characteristic function (probability theory)|Re(characteristic function)]] ''α'' = ''β'' + 1/2; ''β'' ranging from 0 to 25]] The [[Characteristic function (probability theory)|characteristic function]] is the [[Fourier transform]] of the probability density function. The characteristic function of the beta distribution is [[confluent hypergeometric function|Kummer's confluent hypergeometric function]] (of the first kind):<ref name=JKB /><ref name=Abramowitz /><ref name="Zwillinger_2014">{{cite book |author-first1=Izrail Solomonovich |author-last1=Gradshteyn |author-link1=Izrail Solomonovich Gradshteyn |author-first2=Iosif Moiseevich |author-last2=Ryzhik |author-link2=Iosif Moiseevich Ryzhik |author-first3=Yuri Veniaminovich |author-last3=Geronimus |author-link3=Yuri Veniaminovich Geronimus |author-first4=Michail Yulyevich |author-last4=Tseytlin |author-link4=Michail Yulyevich Tseytlin |author-first5=Alan |author-last5=Jeffrey |editor1-first=Daniel |editor1-last=Zwillinger |editor2-first=Victor Hugo |editor2-last=Moll |editor-link2=Victor Hugo Moll |translator=Scripta Technica, Inc. |title=Table of Integrals, Series, and Products |publisher=[[Academic Press, Inc.]] |date=2015 |orig-year=October 2014 |edition=8 |language=en |isbn=978-0-12-384933-5 |lccn=2014010276 <!-- |url=https://books.google.com/books?id=NjnLAwAAQBAJ |access-date=2016-02-21-->|title-link=Gradshteyn and Ryzhik}}</ref> :<math>\begin{align} \varphi_X(\alpha;\beta;t) &= \operatorname{E}\left[e^{itX}\right]\\ &= \int_0^1 e^{itx} f(x;\alpha,\beta) \, dx \\ &={}_1F_1(\alpha; \alpha+\beta; it)\!\\ &=\sum_{n=0}^\infty \frac {\alpha^{(n)} (it)^n} {(\alpha+\beta)^{(n)} n!}\\ &= 1 +\sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} \right) \frac{(it)^k}{k!} \end{align}</math> where : <math>x^{(n)}=x(x+1)(x+2)\cdots(x+n-1)</math> is the [[rising factorial]], also called the "Pochhammer symbol". The value of the characteristic function for ''t'' = 0, is one: :<math> \varphi_X(\alpha;\beta;0)={}_1F_1(\alpha; \alpha+\beta; 0) = 1.</math> Also, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable ''t'': :<math> \operatorname{Re} \left [ {}_1F_1(\alpha; \alpha+\beta; it) \right ] = \operatorname{Re} \left [ {}_1F_1(\alpha; \alpha+\beta; - it) \right ]</math> :<math> \operatorname{Im} \left [ {}_1F_1(\alpha; \alpha+\beta; it) \right ] = - \operatorname{Im} \left [ {}_1F_1(\alpha; \alpha+\beta; - it) \right ]</math> The symmetric case ''α'' = ''β'' simplifies the characteristic function of the beta distribution to a [[Bessel function]], since in the special case ''α'' + ''β'' = 2''α'' the [[confluent hypergeometric function]] (of the first kind) reduces to a [[Bessel function]] (the modified Bessel function of the first kind <math>I_{\alpha-\frac 1 2}</math> ) using [[Ernst Kummer|Kummer's]] second transformation as follows: :<math>\begin{align} {}_1F_1(\alpha;2\alpha; it) &= e^{\frac{it}{2}} {}_0F_1 \left(; \alpha+\tfrac{1}{2}; \frac{(it)^2}{16} \right) \\ &= e^{\frac{it}{2}} \left(\frac{it}{4}\right)^{\frac{1}{2}-\alpha} \Gamma\left(\alpha+\tfrac{1}{2}\right) I_{\alpha-\frac 1 2} \left(\frac{it}{2}\right).\end{align}</math> In the accompanying plots, the [[Complex number|real part]] (Re) of the [[Characteristic function (probability theory)|characteristic function]] of the beta distribution is displayed for symmetric (''α'' = ''β'') and skewed (''α'' ≠ ''β'') cases. ===Other moments=== ====Moment generating function==== It also follows<ref name=JKB /><ref name="Handbook of Beta Distribution" /> that the [[moment generating function]] is :<math>\begin{align} M_X(\alpha; \beta; t) &= \operatorname{E}\left[e^{tX}\right] \\[4pt] &= \int_0^1 e^{tx} f(x;\alpha,\beta)\,dx \\[4pt] &= {}_1F_1(\alpha; \alpha+\beta; t) \\[4pt] &= \sum_{n=0}^\infty \frac {\alpha^{(n)}} {(\alpha+\beta)^{(n)}}\frac {t^n}{n!}\\[4pt] &= 1 +\sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} \right) \frac{t^k}{k!}. \end{align}</math> In particular ''M''<sub>''X''</sub>(''α''; ''β''; 0) = 1. ====Higher moments==== Using the [[moment generating function]], the ''k''-th [[raw moment]] is given by<ref name=JKB/> the factor :<math>\prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} </math> multiplying the (exponential series) term <math>\left(\frac{t^k}{k!}\right)</math> in the series of the [[moment generating function]] :<math>\operatorname{E}[X^k]= \frac{\alpha^{(k)}}{(\alpha + \beta)^{(k)}} = \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r}</math> where (''x'')<sup>(''k'')</sup> is a [[Pochhammer symbol]] representing rising factorial. It can also be written in a recursive form as :<math>\operatorname{E}[X^k] = \frac{\alpha + k - 1}{\alpha + \beta + k - 1}\operatorname{E}[X^{k - 1}].</math> Since the moment generating function <math>M_X(\alpha; \beta; \cdot)</math> has a positive radius of convergence,{{cn|reason=proof that the radius of convergence is positive, not in Billingsley Section 30?|date=December 2024}} the beta distribution is [[Moment problem|determined by its moments]].<ref>{{cite book|last1=Billingsley|first1=Patrick|title=Probability and measure|date=1995|publisher=Wiley-Interscience|isbn=978-0-471-00710-4|edition=3rd|chapter=Section 30: The Method of Moments}}</ref> ====Moments of transformed random variables==== =====Moments of linearly transformed, product and inverted random variables===== One can also show the following expectations for a transformed random variable,<ref name=JKB/> where the random variable ''X'' is Beta-distributed with parameters ''α'' and ''β'': ''X'' ~ Beta(''α'', ''β''). The expected value of the variable 1 − ''X'' is the mirror-symmetry of the expected value based on ''X'': :<math>\begin{align} \operatorname{E}[1-X] &= \frac{\beta}{\alpha + \beta} \\ \operatorname{E}[X(1-X)] &= \operatorname{E}[(1-X)X] = \frac{\alpha\beta}{(\alpha+\beta)(\alpha+\beta+1)} \end{align}</math> Due to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables ''X'' and 1 − ''X'' are identical, and the covariance on ''X''(1 − ''X'' is the negative of the variance: :<math>\operatorname{var}[(1-X)]=\operatorname{var}[X] = -\operatorname{cov}[X,(1-X)]= \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}</math> These are the expected values for inverted variables, (these are related to the harmonic means, see {{section link||Harmonic mean}}): :<math>\begin{align} \operatorname{E} \left [\frac{1}{X} \right ] &= \frac{\alpha+\beta-1 }{\alpha -1 } && \text{ if } \alpha > 1\\ \operatorname{E}\left [\frac{1}{1-X} \right ] &=\frac{\alpha+\beta-1 }{\beta-1 } && \text{ if } \beta > 1 \end{align}</math> The following transformation by dividing the variable ''X'' by its mirror-image ''X''/(1 − ''X'') results in the expected value of the "inverted beta distribution" or [[beta prime distribution]] (also known as beta distribution of the second kind or [[Pearson distribution|Pearson's Type VI]]):<ref name=JKB/> :<math> \begin{align} \operatorname{E}\left[\frac{X}{1-X}\right] &=\frac{\alpha}{\beta - 1 } && \text{ if }\beta > 1\\ \operatorname{E}\left[\frac{1-X}{X}\right] &=\frac{\beta}{\alpha- 1 } && \text{ if }\alpha > 1 \end{align} </math> Variances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables: :<math>\operatorname{var} \left[\frac{1}{X} \right] =\operatorname{E}\left[\left(\frac{1}{X} - \operatorname{E}\left[\frac{1}{X} \right ] \right )^2\right]= \operatorname{var}\left [\frac{1-X}{X} \right ] =\operatorname{E} \left [\left (\frac{1-X}{X} - \operatorname{E}\left [\frac{1-X}{X} \right ] \right )^2 \right ]= \frac{\beta (\alpha+\beta-1)}{(\alpha -2)(\alpha-1)^2 } \text{ if }\alpha > 2</math> The following variance of the variable ''X'' divided by its mirror-image (''X''/(1−''X'') results in the variance of the "inverted beta distribution" or [[beta prime distribution]] (also known as beta distribution of the second kind or [[Pearson distribution|Pearson's Type VI]]):<ref name=JKB/> :<math>\operatorname{var} \left [\frac{1}{1-X} \right ] =\operatorname{E} \left [\left(\frac{1}{1-X} - \operatorname{E} \left [\frac{1}{1-X} \right ] \right)^2 \right ]=\operatorname{var} \left [\frac{X}{1-X} \right ] = \operatorname{E} \left [\left (\frac{X}{1-X} - \operatorname{E} \left [\frac{X}{1-X} \right ] \right )^2 \right ]= \frac{\alpha(\alpha+\beta-1)}{(\beta-2)(\beta-1)^2 } \text{ if }\beta > 2</math> The covariances are: :<math>\operatorname{cov}\left [\frac{1}{X},\frac{1}{1-X} \right ] = \operatorname{cov}\left[\frac{1-X}{X},\frac{X}{1-X} \right] =\operatorname{cov}\left[\frac{1}{X},\frac{X}{1-X}\right ] = \operatorname{cov}\left[\frac{1-X}{X},\frac{1}{1-X} \right] =\frac{\alpha+\beta-1}{(\alpha-1)(\beta-1) } \text{ if } \alpha, \beta > 1</math> These expectations and variances appear in the four-parameter Fisher information matrix ({{section link||Fisher information}}.) =====Moments of logarithmically transformed random variables===== [[File:Logit.svg|thumbnail|right|350px|Plot of logit(''X'') = ln(''X''/(1 −''X'')) (vertical axis) vs. ''X'' in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable]] Expected values for [[Logarithm transformation|logarithmic transformations]] (useful for [[maximum likelihood]] estimates, see {{section link||Parameter estimation, Maximum likelihood}}) are discussed in this section. The following logarithmic linear transformations are related to the geometric means ''G<sub>X</sub>'' and ''G''<sub>(1−''X'')</sub> (see {{section link||Geometric Mean}}): :<math>\begin{align} \operatorname{E}[\ln(X)] &= \psi(\alpha) - \psi(\alpha + \beta)= - \operatorname{E}\left[\ln \left (\frac{1}{X} \right )\right],\\ \operatorname{E}[\ln(1-X)] &=\psi(\beta) - \psi(\alpha + \beta)= - \operatorname{E} \left[\ln \left (\frac{1}{1-X} \right )\right]. \end{align}</math> Where the '''[[digamma function]]''' ''ψ''(''α'') is defined as the [[logarithmic derivative]] of the [[gamma function]]:<ref name=Abramowitz/> :<math>\psi(\alpha) = \frac{d \ln\Gamma(\alpha)}{d\alpha}</math> [[Logit]] transformations are interesting,<ref name=MacKay>{{cite book|last=MacKay|first=David|title=Information Theory, Inference and Learning Algorithms|year=2003| publisher=Cambridge University Press; First Edition |isbn=978-0521642989|bibcode=2003itil.book.....M}}</ref> as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable: :<math>\begin{align} \operatorname{E}\left[\ln \left (\frac{X}{1-X} \right ) \right] &=\psi(\alpha) - \psi(\beta)= \operatorname{E}[\ln(X)] +\operatorname{E} \left[\ln \left (\frac{1}{1-X} \right) \right],\\ \operatorname{E}\left [\ln \left (\frac{1-X}{X} \right ) \right ] &=\psi(\beta) - \psi(\alpha)= - \operatorname{E} \left[\ln \left (\frac{X}{1-X} \right) \right] . \end{align}</math> Johnson<ref name=JohnsonLogInv>{{cite journal|last=Johnson|first=N.L.|title=Systems of frequency curves generated by methods of translation| journal=Biometrika|year=1949 |volume=36 |issue=1–2|pages=149–176|doi=10.1093/biomet/36.1-2.149|pmid=18132090|hdl=10338.dmlcz/135506|url=http://dml.cz/bitstream/handle/10338.dmlcz/135506/Kybernetika_39-2003-1_3.pdf}}</ref> considered the distribution of the [[logit]] – transformed variable ln(''X''/1 − ''X''), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support [0, 1] based on the original variable ''X'' to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the [[logistic-beta distribution]]. Higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows: :<math>\begin{align} \operatorname{E} \left [\ln^2(X) \right ] &= (\psi(\alpha) - \psi(\alpha + \beta))^2+\psi_1(\alpha)-\psi_1(\alpha+\beta), \\ \operatorname{E} \left [\ln^2(1-X) \right ] &= (\psi(\beta) - \psi(\alpha + \beta))^2+\psi_1(\beta)-\psi_1(\alpha+\beta), \\ \operatorname{E} \left [\ln (X)\ln(1-X) \right ] &=(\psi(\alpha) - \psi(\alpha + \beta))(\psi(\beta) - \psi(\alpha + \beta)) -\psi_1(\alpha+\beta). \end{align}</math> therefore the [[variance]] of the logarithmic variables and [[covariance]] of ln(''X'') and ln(1−''X'') are: :<math>\begin{align} \operatorname{cov}[\ln(X), \ln(1-X)] &= \operatorname{E}\left[\ln(X)\ln(1-X)\right] - \operatorname{E}[\ln(X)]\operatorname{E}[\ln(1-X)] = -\psi_1(\alpha+\beta) \\ & \\ \operatorname{var}[\ln X] &= \operatorname{E}[\ln^2(X)] - (\operatorname{E}[\ln(X)])^2 \\ &= \psi_1(\alpha) - \psi_1(\alpha + \beta) \\ &= \psi_1(\alpha) + \operatorname{cov}[\ln(X), \ln(1-X)] \\ & \\ \operatorname{var}[\ln (1-X)] &= \operatorname{E}[\ln^2 (1-X)] - (\operatorname{E}[\ln (1-X)])^2 \\ &= \psi_1(\beta) - \psi_1(\alpha + \beta) \\ &= \psi_1(\beta) + \operatorname{cov}[\ln (X), \ln(1-X)] \end{align}</math> where the '''[[trigamma function]]''', denoted ''ψ''<sub>1</sub>(''α''), is the second of the [[polygamma function]]s, and is defined as the derivative of the [[digamma]] function: :<math>\psi_1(\alpha) = \frac{d^2\ln\Gamma(\alpha)}{d\alpha^2}= \frac{d \psi(\alpha)}{d\alpha}. </math> The variances and covariance of the logarithmically transformed variables ''X'' and (1 − ''X'') are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables ''X'' and (1 − ''X''), as the logarithm approaches negative infinity for the variable approaching zero. These logarithmic variances and covariance are the elements of the [[Fisher information]] matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation). The variances of the log inverse variables are identical to the variances of the log variables: :<math>\begin{align} \operatorname{var}\left[\ln \left (\frac{1}{X} \right ) \right] & =\operatorname{var}[\ln(X)] = \psi_1(\alpha) - \psi_1(\alpha + \beta), \\ \operatorname{var}\left[\ln \left (\frac{1}{1-X} \right ) \right] &=\operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta), \\ \operatorname{cov}\left[\ln \left (\frac{1}{X} \right), \ln \left (\frac{1}{1-X}\right ) \right] &=\operatorname{cov}[\ln(X),\ln(1-X)]= -\psi_1(\alpha + \beta).\end{align}</math> It also follows that the variances of the [[logit]]-transformed variables are :<math>\operatorname{var}\left[\ln \left (\frac{X}{1-X} \right )\right]=\operatorname{var}\left[\ln \left (\frac{1-X}{X} \right ) \right]=-\operatorname{cov}\left [\ln \left (\frac{X}{1-X} \right ), \ln \left (\frac{1-X}{X} \right ) \right]= \psi_1(\alpha) + \psi_1(\beta).</math> ===Quantities of information (entropy)=== Given a beta distributed random variable, ''X'' ~ Beta(''α'', ''β''), the [[information entropy|differential entropy]] of ''X'' is (measured in [[Nat (unit)|nats]]),<ref>{{cite journal |first1=A. C. G. |last1=Verdugo Lazo |first2=P. N. |last2=Rathie |title=On the entropy of continuous probability distributions |journal=IEEE Trans. Inf. Theory |volume=24 |issue=1 |pages=120–122 |year=1978 |doi=10.1109/TIT.1978.1055832 }}</ref> the expected value of the negative of the logarithm of the [[probability density function]]: :<math>\begin{align} h(X) &= \operatorname{E}[-\ln(f(X;\alpha,\beta))] \\[4pt] &=\int_0^1 -f(x;\alpha,\beta)\ln(f(x;\alpha,\beta)) \, dx \\[4pt] &= \ln(\Beta(\alpha,\beta))-(\alpha-1)\psi(\alpha)-(\beta-1)\psi(\beta)+(\alpha+\beta-2) \psi(\alpha+\beta) \end{align}</math> where ''f''(''x''; ''α'', ''β'') is the [[probability density function]] of the beta distribution: :<math>f(x;\alpha,\beta) = \frac{1}{\Beta(\alpha,\beta)} x^{\alpha-1}(1-x)^{\beta-1}</math> The [[digamma function]] ''ψ'' appears in the formula for the differential entropy as a consequence of Euler's integral formula for the [[harmonic number]]s which follows from the integral: :<math>\int_0^1 \frac {1-x^{\alpha-1}}{1-x} \, dx = \psi(\alpha)-\psi(1)</math> The [[information entropy|differential entropy]] of the beta distribution is negative for all values of ''α'' and ''β'' greater than zero, except at ''α'' = ''β'' = 1 (for which values the beta distribution is the same as the [[Uniform distribution (continuous)|uniform distribution]]), where the [[information entropy|differential entropy]] reaches its [[Maxima and minima|maximum]] value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable. For ''α'' or ''β'' approaching zero, the [[information entropy|differential entropy]] approaches its [[Maxima and minima|minimum]] value of negative infinity. For (either or both) ''α'' or ''β'' approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) ''α'' or ''β'' approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either ''α'' or ''β'' approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), ''α'' = ''β'', and they approach infinity simultaneously, the probability density becomes a spike ([[Dirac delta function]]) concentrated at the middle ''x'' = 1/2, and hence there is 100% probability at the middle ''x'' = 1/2 and zero probability everywhere else. [[File:Differential Entropy Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px]][[File:Differential Entropy Beta Distribution for alpha and beta from 0.1 to 5 - J. Rodal.jpg|325px]] The (continuous case) [[information entropy|differential entropy]] was introduced by Shannon in his original paper (where he named it the "entropy of a continuous distribution"), as the concluding part of the same paper where he defined the [[information entropy|discrete entropy]].<ref>{{cite journal |last=Shannon |first=Claude E. |title=A Mathematical Theory of Communication |journal=Bell System Technical Journal |volume=27 |issue=4 |pages=623–656 |year=1948 |doi=10.1002/j.1538-7305.1948.tb01338.x }}</ref> It is known since then that the differential entropy may differ from the infinitesimal limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy. Given two beta distributed random variables, ''X''<sub>1</sub> ~ Beta(''α'', ''β'') and ''X''<sub>2</sub> ~ Beta(''{{prime|α}}'', ''{{prime|β}}''), the [[cross-entropy]] is (measured in nats)<ref name="Cover and Thomas">{{cite book|last=Cover|first=Thomas M. and Joy A. Thomas|title=Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing) |year=2006 |publisher=Wiley-Interscience; 2 edition |isbn=978-0471241959}}</ref> :<math>\begin{align} H(X_1,X_2) &= \int_0^1 - f(x;\alpha,\beta) \ln (f(x;\alpha',\beta')) \,dx \\[4pt] &= \ln \left(\Beta(\alpha',\beta')\right)-(\alpha'-1)\psi(\alpha)-(\beta'-1)\psi(\beta)+(\alpha'+\beta'-2)\psi(\alpha+\beta). \end{align}</math> The [[cross entropy]] has been used as an error metric to measure the distance between two hypotheses.<ref name=Plunkett>{{cite book|last=Plunkett|first=Kim, and Jeffrey Elman|title=Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)|year=1997|publisher=A Bradford Book|page=166|isbn=978-0262661058|url-access=registration|url=https://archive.org/details/exercisesinrethi0000plun}}</ref><ref name=Nallapati>{{cite thesis|last=Nallapati|first=Ramesh|title=The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval|year=2006|publisher=Computer Science Dept., University of Massachusetts Amherst|url=http://maroo.cs.umass.edu/pub/web/getpdf.php?id=679}}</ref> Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood <ref name="Cover and Thomas" />(see section on "Parameter estimation. Maximum likelihood estimation")). The relative entropy, or [[Kullback–Leibler divergence]] ''D''<sub>KL</sub>(''X''<sub>1</sub> || ''X''<sub>2</sub>), is a measure of the inefficiency of assuming that the distribution is ''X''<sub>2</sub> ~ Beta(''{{prime|α}}'', ''{{prime|β}}'') when the distribution is really ''X''<sub>1</sub> ~ Beta(''α'', ''β''). It is defined as follows (measured in nats). :<math>\begin{align} D_{\mathrm{KL}}(X_1\parallel X_2) &= \int_0^1 f(x;\alpha,\beta) \ln \left (\frac{f(x;\alpha,\beta)}{f(x;\alpha',\beta')} \right ) \, dx \\[4pt] &= \left (\int_0^1 f(x;\alpha,\beta) \ln (f(x;\alpha,\beta)) \,dx \right )- \left (\int_0^1 f(x;\alpha,\beta) \ln (f(x;\alpha',\beta')) \, dx \right )\\[4pt] &= -h(X_1) + H(X_1,X_2)\\[4pt] &= \ln\left(\frac{\Beta(\alpha',\beta')}{\Beta(\alpha,\beta)}\right)+(\alpha-\alpha')\psi(\alpha)+(\beta-\beta')\psi(\beta)+(\alpha'-\alpha+\beta'-\beta)\psi (\alpha + \beta). \end{align} </math> The relative entropy, or [[Kullback–Leibler divergence]], is always non-negative. A few numerical examples follow: *''X''<sub>1</sub> ~ Beta(1, 1) and ''X''<sub>2</sub> ~ Beta(3, 3); ''D''<sub>KL</sub>(''X''<sub>1</sub> || ''X''<sub>2</sub>) = 0.598803; ''D''<sub>KL</sub>(''X''<sub>2</sub> || ''X''<sub>1</sub>) = 0.267864; ''h''(''X''<sub>1</sub>) = 0; ''h''(''X''<sub>2</sub>) = −0.267864 *''X''<sub>1</sub> ~ Beta(3, 0.5) and ''X''<sub>2</sub> ~ Beta(0.5, 3); ''D''<sub>KL</sub>(''X''<sub>1</sub> || ''X''<sub>2</sub>) = 7.21574; ''D''<sub>KL</sub>(''X''<sub>2</sub> || ''X''<sub>1</sub>) = 7.21574; ''h''(''X''<sub>1</sub>) = −1.10805; ''h''(''X''<sub>2</sub>) = −1.10805. The [[Kullback–Leibler divergence]] is not symmetric ''D''<sub>KL</sub>(''X''<sub>1</sub> || ''X''<sub>2</sub>) ≠ ''D''<sub>KL</sub>(''X''<sub>2</sub> || ''X''<sub>1</sub>) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies ''h''(''X''<sub>1</sub>) ≠ ''h''(''X''<sub>2</sub>). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The "h" entropy of Beta(1, 1) is higher than the "h" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the [[second law of thermodynamics]]. The [[Kullback–Leibler divergence]] is symmetric ''D''<sub>KL</sub>(''X''<sub>1</sub> || ''X''<sub>2</sub>) = ''D''<sub>KL</sub>(''X''<sub>2</sub> || ''X''<sub>1</sub>) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy ''h''(''X''<sub>1</sub>) = ''h''(''X''<sub>2</sub>). The symmetry condition: :<math>D_{\mathrm{KL}}(X_1\parallel X_2) = D_{\mathrm{KL}}(X_2\parallel X_1),\text{ if }h(X_1) = h(X_2),\text{ for (skewed) }\alpha \neq \beta</math> follows from the above definitions and the mirror-symmetry ''f''(''x''; ''α'', ''β'') = ''f''(1 − ''x''; ''α'', ''β'') enjoyed by the beta distribution. ===Relationships between statistical measures=== ====Mean, mode and median relationship==== If 1 < ''α'' < ''β'' then mode ≤ median ≤ mean.<ref name=Kerman2011>{{cite arXiv | eprint=1111.0433 | last1=Kerman | first1=Jouni | title=A closed-form approximation for the median of the beta distribution | date=2011 | class=math.ST }}</ref> Expressing the mode (only for ''α'', ''β'' > 1), and the mean in terms of ''α'' and ''β'': : <math> \frac{ \alpha - 1 }{ \alpha + \beta - 2 } \le \text{median} \le \frac{ \alpha }{ \alpha + \beta } ,</math> If 1 < ''β'' < ''α'' then the order of the inequalities are reversed. For ''α'', ''β'' > 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of ''x''. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of ''x'', for the ([[Pathological (mathematics)|pathological]]) case of ''α'' = 1 and ''β'' = 1, for which values the beta distribution approaches the uniform distribution and the [[information entropy|differential entropy]] approaches its [[Maxima and minima|maximum]] value, and hence maximum "disorder". For example, for ''α'' = 1.0001 and ''β'' = 1.00000001: * mode = 0.9999; PDF(mode) = 1.00010 * mean = 0.500025; PDF(mean) = 1.00003 * median = 0.500035; PDF(median) = 1.00003 * mean − mode = −0.499875 * mean − median = −9.65538 × 10<sup>−6</sup> where PDF stands for the value of the [[probability density function]]. [[File:Mean Median Difference - Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px]] [[File:Mean Mode Difference - Beta Distribution for alpha and beta from 1 to 5 - J. Rodal.jpg|325px]] ====Mean, geometric mean and harmonic mean relationship==== [[File:Mean, Median, Geometric Mean and Harmonic Mean for Beta distribution with alpha = beta from 0 to 5 - J. Rodal.png|thumb|:Mean, median, geometric mean and harmonic mean for beta distribution with 0 < ''α'' = ''β'' < 5]] It is known from the [[inequality of arithmetic and geometric means]] that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for ''α'' = ''β'', both the mean and the median are exactly equal to 1/2, regardless of the value of ''α'' = ''β'', and the mode is also equal to 1/2 for ''α'' = ''β'' > 1, however the geometric and harmonic means are lower than 1/2 and they only approach this value asymptotically as ''α'' = ''β'' → ∞. ====Kurtosis bounded by the square of the skewness==== [[File:(alpha and beta) Parameter estimates vs. excess Kurtosis and (squared) Skewness Beta distribution - J. Rodal.png|thumb|left|Beta distribution ''α'' and ''β'' parameters vs. excess kurtosis and squared skewness]] As remarked by [[William Feller|Feller]],<ref name=Feller /> in the [[Pearson distribution|Pearson system]] the beta probability density appears as [[Pearson distribution|type I]] (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). [[Karl Pearson]] showed, in Plate 1 of his paper <ref name=Pearson>{{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | year = 1916 | title = Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation | journal = Philosophical Transactions of the Royal Society A | volume = 216 | issue =538–548 | pages = 429–457 | doi = 10.1098/rsta.1916.0009 | jstor=91092|bibcode = 1916RSPTA.216..429P | doi-access = free }}</ref> published in 1916, a graph with the [[kurtosis]] as the vertical axis ([[ordinate]]) and the square of the [[skewness]] as the horizontal axis ([[abscissa]]), in which a number of distributions were displayed.<ref name=Egon>{{cite journal|last=Pearson|first=Egon S.|title=Some historical reflections traced through the development of the use of frequency curves|journal=THEMIS Statistical Analysis Research Program, Technical Report 38|date=July 1969|volume=Office of Naval Research, Contract N000014-68-A-0515|issue=Project NR 042–260|url=http://www.smu.edu/Dedman/Academics/Departments/Statistics/Research/TechnicalReports}}</ref> The region occupied by the beta distribution is bounded by the following two [[Line (geometry)|lines]] in the (skewness<sup>2</sup>,kurtosis) [[Cartesian coordinate system|plane]], or the (skewness<sup>2</sup>,excess kurtosis) [[Cartesian coordinate system|plane]]: :<math>(\text{skewness})^2+1< \text{kurtosis}< \frac{3}{2} (\text{skewness})^2 + 3</math> or, equivalently, :<math>(\text{skewness})^2-2< \text{excess kurtosis}< \frac{3}{2} (\text{skewness})^2</math> At a time when there were no powerful digital computers, [[Karl Pearson]] accurately computed further boundaries,<ref name="Hahn and Shapiro">{{cite book|last1=Hahn|first1=Gerald J.|last2=Shapiro|first2=S.|title=Statistical Models in Engineering (Wiley Classics Library)|year=1994|publisher=Wiley-Interscience|isbn=978-0471040651}}</ref><ref name=Pearson /> for example, separating the "U-shaped" from the "J-shaped" distributions. The lower boundary line (excess kurtosis + 2 − skewness<sup>2</sup> = 0) is produced by skewed "U-shaped" beta distributions with both values of shape parameters ''α'' and ''β'' close to zero. The upper boundary line (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. [[Karl Pearson]] showed<ref name=Pearson/> that this upper boundary line (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, [[Egon Pearson]], showed<ref name=Egon/> that the region (in the kurtosis/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is shared with the [[noncentral chi-squared distribution]]. Karl Pearson<ref name=Pearson1895>{{cite journal | last = Pearson | first = Karl | author-link = Karl Pearson | year = 1895 | title = Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material | journal = Philosophical Transactions of the Royal Society | volume = 186 | pages = 343–414 | doi = 10.1098/rsta.1895.0010 | jstor=90649 | bibcode=1895RSPTA.186..343P| doi-access = free }}</ref> (Pearson 1895, pp. 357, 360, 373–376) also showed that the [[gamma distribution]] is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6/''k'' and the square of the skewness is 4/''k'', hence (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter "k"). Pearson later noted that the [[chi-squared distribution]] is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the [[chi-squared distribution]] the excess kurtosis is 12/''k'' and the square of the skewness is 8/''k'', hence (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is identically satisfied regardless of the value of the parameter "k"). This is to be expected, since the chi-squared distribution ''X'' ~ χ<sup>2</sup>(''k'') is a special case of the gamma distribution, with parametrization X ~ Γ(k/2, 1/2) where k is a positive integer that specifies the "number of degrees of freedom" of the chi-squared distribution. An example of a beta distribution near the upper boundary (excess kurtosis − (3/2) skewness<sup>2</sup> = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)/(skewness<sup>2</sup>) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness<sup>2</sup> = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)/(skewness<sup>2</sup>) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both ''α'' and ''β'' approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ([[ordinate]]). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards). Values for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness<sup>2</sup> = 0) cannot occur for any distribution, and hence [[Karl Pearson]] appropriately called the region below this boundary the "impossible region". The boundary for this "impossible region" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters ''α'' and ''β'' approach zero and hence all the probability density is concentrated at the ends: ''x'' = 0, 1 with practically nothing in between them. Since for ''α'' ≈ ''β'' ≈ 0 the probability density is concentrated at the two ends ''x'' = 0 and ''x'' = 1, this "impossible boundary" is determined by a [[Bernoulli distribution]], where the two only possible outcomes occur with respective probabilities ''p'' and ''q'' = 1 − ''p''. For cases approaching this limit boundary with symmetry ''α'' = ''β'', skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are ''p'' ≈ ''q'' ≈ 1/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness<sup>2</sup>, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities <math>p = \tfrac{\beta}{\alpha + \beta}</math> at the left end ''x'' = 0 and <math>q = 1-p = \tfrac{\alpha}{\alpha + \beta}</math> at the right end ''x'' = 1. ===Symmetry=== All statements are conditional on ''α'', ''β'' > 0: * '''Probability density function''' [[Symmetry|reflection symmetry]] ::<math>f(x;\alpha,\beta) = f(1-x;\beta,\alpha)</math> * '''Cumulative distribution function''' [[Symmetry|reflection symmetry]] plus unitary [[Symmetry|translation]] ::<math>F(x;\alpha,\beta) = I_x(\alpha,\beta) = 1- F(1- x;\beta,\alpha) = 1 - I_{1-x}(\beta,\alpha)</math> * '''Mode''' [[Symmetry|reflection symmetry]] plus unitary [[Symmetry|translation]] ::<math>\operatorname{mode}(\Beta(\alpha, \beta))= 1-\operatorname{mode}(\Beta(\beta, \alpha)),\text{ if }\Beta(\beta, \alpha)\ne \Beta(1,1)</math> * '''Median''' [[Symmetry|reflection symmetry]] plus unitary [[Symmetry|translation]] ::<math>\operatorname{median} (\Beta(\alpha, \beta) )= 1 - \operatorname{median} (\Beta(\beta, \alpha))</math> * '''Mean''' [[Symmetry|reflection symmetry]] plus unitary [[Symmetry|translation]] ::<math>\mu (\Beta(\alpha, \beta) )= 1 - \mu (\Beta(\beta, \alpha) )</math> * '''Geometric means''' each is individually asymmetric, the following symmetry applies between the geometric mean based on ''X'' and the geometric mean based on its [[Reflection formula|reflection]] (1-X) ::<math>G_X (\Beta(\alpha, \beta) )=G_{(1-X)}(\Beta(\beta, \alpha) ) </math> * '''Harmonic means''' each is individually asymmetric, the following symmetry applies between the harmonic mean based on ''X'' and the harmonic mean based on its [[Reflection formula|reflection]] (1-X) ::<math>H_X (\Beta(\alpha, \beta) )=H_{(1-X)}(\Beta(\beta, \alpha) ) \text{ if } \alpha, \beta > 1 </math> . * '''Variance''' symmetry ::<math>\operatorname{var} (\Beta(\alpha, \beta) )=\operatorname{var} (\Beta(\beta, \alpha) )</math> * '''Geometric variances''' each is individually asymmetric, the following symmetry applies between the log geometric variance based on X and the log geometric variance based on its [[Reflection formula|reflection]] (1-X) ::<math>\ln(\operatorname{var_{GX}} (\Beta(\alpha, \beta))) = \ln(\operatorname{var_{G(1-X)}}(\Beta(\beta, \alpha))) </math> * '''Geometric covariance''' symmetry ::<math>\ln \operatorname{cov_{GX,(1-X)}}(\Beta(\alpha, \beta))=\ln \operatorname{cov_{GX,(1-X)}}(\Beta(\beta, \alpha))</math> * '''Mean [[absolute deviation]] around the mean''' symmetry ::<math>\operatorname{E}[|X - E[X]| ] (\Beta(\alpha, \beta))=\operatorname{E}[| X - E[X]|] (\Beta(\beta, \alpha))</math> * '''Skewness''' [[Symmetry (mathematics)|skew-symmetry]] ::<math>\operatorname{skewness} (\Beta(\alpha, \beta) )= - \operatorname{ skewness} (\Beta(\beta, \alpha) )</math> * '''Excess kurtosis''' symmetry ::<math>\text{excess kurtosis} (\Beta(\alpha, \beta) )= \text{excess kurtosis} (\Beta(\beta, \alpha) )</math> * '''Characteristic function''' symmetry of [[Real part]] (with respect to the origin of variable "t") ::<math> \text{Re} [{}_1F_1(\alpha; \alpha+\beta; it) ] = \text{Re} [ {}_1F_1(\alpha; \alpha+\beta; - it)] </math> * '''Characteristic function''' [[Symmetry (mathematics)|skew-symmetry]] of [[Imaginary part]] (with respect to the origin of variable "t") ::<math> \text{Im} [{}_1F_1(\alpha; \alpha+\beta; it) ] = - \text{Im} [ {}_1F_1(\alpha; \alpha+\beta; - it) ] </math> * '''Characteristic function''' symmetry of [[Absolute value]] (with respect to the origin of variable "t") ::<math> \text{Abs} [ {}_1F_1(\alpha; \alpha+\beta; it) ] = \text{Abs} [ {}_1F_1(\alpha; \alpha+\beta; - it) ] </math> * '''Differential entropy''' symmetry ::<math>h(\Beta(\alpha, \beta) )= h(\Beta(\beta, \alpha) )</math> * '''Relative entropy (also called [[Kullback–Leibler divergence]])''' symmetry ::<math>D_{\mathrm{KL}}(X_1\parallel X_2) = D_{\mathrm{KL}}(X_2\parallel X_1), \text{ if }h(X_1) = h(X_2)\text{, for (skewed) }\alpha \neq \beta</math> * '''Fisher information matrix''' symmetry ::<math>{\mathcal{I}}_{i, j} = {\mathcal{I}}_{j, i}</math> ===Geometry of the probability density function=== ====Inflection points==== [[File:Inflexion points Beta Distribution alpha and beta ranging from 0 to 5 large ptl view - J. Rodal.jpg|thumb|Inflection point location versus α and β showing regions with one inflection point]] [[File:Inflexion points Beta Distribution alpha and beta ranging from 0 to 5 large ptr view - J. Rodal.jpg|thumb|Inflection point location versus α and β showing region with two inflection points]] For certain values of the shape parameters α and β, the [[probability density function]] has [[inflection points]], at which the [[curvature]] changes sign. The position of these inflection points can be useful as a measure of the [[Statistical dispersion|dispersion]] or spread of the distribution. Defining the following quantity: :<math>\kappa =\frac{\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> Points of inflection occur,<ref name=JKB /><ref name=Wadsworth /><ref name="Handbook of Beta Distribution" /><ref name=Panik /> depending on the value of the shape parameters ''α'' and ''β'', as follows: *(''α'' > 2, ''β'' > 2) The distribution is bell-shaped (symmetric for ''α'' = ''β'' and skewed otherwise), with '''two inflection points''', equidistant from the mode: ::<math>x = \text{mode} \pm \kappa = \frac{\alpha -1 \pm \sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> * (''α'' = 2, ''β'' > 2) The distribution is unimodal, positively skewed, right-tailed, with '''one inflection point''', located to the right of the mode: ::<math>x =\text{mode} + \kappa = \frac{2}{\beta}</math> * (''α'' > 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with '''one inflection point''', located to the left of the mode: ::<math>x = \text{mode} - \kappa = 1 - \frac{2}{\alpha}</math> * (1 < ''α'' < 2, β > 2, ''α'' + ''β'' > 2) The distribution is unimodal, positively skewed, right-tailed, with '''one inflection point''', located to the right of the mode: ::<math>x =\text{mode} + \kappa = \frac{\alpha -1 +\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> *(0 < ''α'' < 1, 1 < ''β'' < 2) The distribution has a mode at the left end ''x'' = 0 and it is positively skewed, right-tailed. There is '''one inflection point''', located to the right of the mode: ::<math>x = \frac{\alpha -1 +\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> *(''α'' > 2, 1 < ''β'' < 2) The distribution is unimodal negatively skewed, left-tailed, with '''one inflection point''', located to the left of the mode: ::<math>x =\text{mode} - \kappa = \frac{\alpha -1 -\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> *(1 < ''α'' < 2, 0 < ''β'' < 1) The distribution has a mode at the right end ''x'' = 1 and it is negatively skewed, left-tailed. There is '''one inflection point''', located to the left of the mode: ::<math>x = \frac{\alpha -1 -\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2}</math> There are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (''α'', ''β'' < 1) upside-down-U-shaped: (1 < ''α'' < 2, 1 < ''β'' < 2), reverse-J-shaped (''α'' < 1, ''β'' > 2) or J-shaped: (''α'' > 2, ''β'' < 1) The accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus ''α'' and ''β'' (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines ''α'' = 1, ''β'' = 1, ''α'' = 2, and ''β'' = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode. ====Shapes==== [[File:PDF for symmetric beta distribution vs. x and alpha=beta from 0 to 30 - J. Rodal.jpg|thumb|PDF for symmetric beta distribution vs. ''x'' and ''α'' = ''β'' from 0 to 30]] [[File:PDF for symmetric beta distribution vs. x and alpha=beta from 0 to 2 - J. Rodal.jpg|thumb|PDF for symmetric beta distribution vs. x and ''α'' = ''β'' from 0 to 2]] [[File:PDF for skewed beta distribution vs. x and beta= 2.5 alpha from 0 to 9 - J. Rodal.jpg|thumb|PDF for skewed beta distribution vs. ''x'' and ''β'' = 2.5''α'' from 0 to 9]] [[File:PDF for skewed beta distribution vs. x and beta= 5.5 alpha from 0 to 9 - J. Rodal.jpg|thumb|PDF for skewed beta distribution vs. x and ''β'' = 5.5''α'' from 0 to 9]] [[File:PDF for skewed beta distribution vs. x and beta= 8 alpha from 0 to 10 - J. Rodal.jpg|thumb|PDF for skewed beta distribution vs. x and ''β'' = 8''α'' from 0 to 10]] The beta density function can take a wide variety of different shapes depending on the values of the two parameters ''α'' and ''β''. The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements: =====Symmetric (''α'' = ''β'')===== * the density function is [[symmetry|symmetric]] about 1/2 (blue & teal plots). * median = mean = 1/2. *skewness = 0. *variance = 1/(4(2''α'' + 1)) *'''''α'' = ''β'' < 1''' **U-shaped (blue plot). **bimodal: left mode = 0, right mode =1, anti-mode = 1/2 **1/12 < var(''X'') < 1/4<ref name=JKB/> **−2 < excess kurtosis(''X'') < −6/5 ** ''α'' = ''β'' = 1/2 is the [[arcsine distribution]] *** var(''X'') = 1/8 ***excess kurtosis(''X'') = −3/2 ***CF = Rinc (t) <ref>{{Cite book|last1=Buchanan|first1=K.|last2=Rockway|first2=J.|last3=Sternberg|first3=O.|last4=Mai|first4=N. N.|title=2016 IEEE Radar Conference (RadarConf) |chapter=Sum-difference beamforming for radar applications using circularly tapered random arrays |date=May 2016|pages=1–5|doi=10.1109/RADAR.2016.7485289|isbn=978-1-5090-0863-6|s2cid=32525626|chapter-url=https://zenodo.org/record/1279364}}</ref> ** ''α'' = ''β'' → 0 is a 2-point [[Bernoulli distribution]] with equal probability 1/2 at each [[Dirac delta function]] end ''x'' = 0 and ''x'' = 1 and zero probability everywhere else. A coin toss: one face of the coin being ''x'' = 0 and the other face being ''x'' = 1. *** <math> \lim_{\alpha = \beta \to 0} \operatorname{var}(X) = \tfrac{1}{4} </math> *** <math> \lim_{\alpha = \beta \to 0} \operatorname{excess \ kurtosis}(X) = - 2</math> a lower value than this is impossible for any distribution to reach. *** The [[information entropy|differential entropy]] approaches a [[Maxima and minima|minimum]] value of −∞ *'''α = β = 1''' **the [[uniform distribution (continuous)|uniform [0, 1] distribution]] **no mode **var(''X'') = 1/12 **excess kurtosis(''X'') = −6/5 **The (negative anywhere else) [[information entropy|differential entropy]] reaches its [[Maxima and minima|maximum]] value of zero **CF = Sinc (t) *'''''α'' = ''β'' > 1''' **symmetric [[unimodal]] ** mode = 1/2. **0 < var(''X'') < 1/12<ref name=JKB/> **−6/5 < excess kurtosis(''X'') < 0 **''α'' = ''β'' = 3/2 is a semi-elliptic [0, 1] distribution, see: [[Wigner semicircle distribution]]<ref>{{Cite book|last1=Buchanan|first1=K.|last2=Flores|first2=C.|last3=Wheeland|first3=S.|last4=Jensen|first4=J.|last5=Grayson|first5=D.|last6=Huff|first6=G.|title=2017 IEEE Radar Conference (RadarConf) |chapter=Transmit beamforming for radar applications using circularly tapered random arrays |date=May 2017|pages=0112–0117|doi=10.1109/RADAR.2017.7944181|isbn=978-1-4673-8823-8|s2cid=38429370}}</ref> ***var(''X'') = 1/16. ***excess kurtosis(''X'') = −1 ***CF = 2 Jinc (t) **''α'' = ''β'' = 2 is the parabolic [0, 1] distribution ***var(''X'') = 1/20 ***excess kurtosis(''X'') = −6/7 ***CF = 3 Tinc (t) <ref>{{Cite web|last=Ryan|first=Buchanan, Kristopher|date=2014-05-29|title=Theory and Applications of Aperiodic (Random) Phased Arrays|url=http://oaktrust.library.tamu.edu/handle/1969.1/157918|language=en}}</ref> **''α'' = ''β'' > 2 is bell-shaped, with [[inflection point]]s located to either side of the mode ***0 < var(''X'') < 1/20 ***−6/7 < excess kurtosis(''X'') < 0 **''α'' = ''β'' → ∞ is a 1-point [[Degenerate distribution]] with a [[Dirac delta function]] spike at the midpoint ''x'' = 1/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point ''x'' = 1/2. ***<math> \lim_{\alpha = \beta \to \infty} \operatorname{var}(X) = 0 </math> ***<math> \lim_{\alpha = \beta \to \infty} \operatorname{excess \ kurtosis}(X) = 0</math> ***The [[information entropy|differential entropy]] approaches a [[Maxima and minima|minimum]] value of −∞ =====Skewed (''α'' ≠ ''β'')===== The density function is [[Skewness|skewed]]. An interchange of parameter values yields the [[mirror image]] (the reverse) of the initial curve, some more specific cases: *'''''α'' < 1, ''β'' < 1''' ** U-shaped ** Positive skew for ''α'' < ''β'', negative skew for ''α'' > ''β''. ** bimodal: left mode = 0, right mode = 1, anti-mode = <math>\tfrac{\alpha-1}{\alpha + \beta-2} </math> ** 0 < median < 1. ** 0 < var(''X'') < 1/4 *'''''α'' > 1, ''β'' > 1''' ** [[unimodal]] (magenta & cyan plots), **Positive skew for ''α'' < ''β'', negative skew for ''α'' > ''β''. **<math>\text{mode}= \tfrac{\alpha-1}{\alpha + \beta-2} </math> ** 0 < median < 1 ** 0 < var(''X'') < 1/12 *'''''α'' < 1, ''β'' ≥ 1''' **reverse J-shaped with a right tail, **positively skewed, **strictly decreasing, [[convex function|convex]] ** mode = 0 ** 0 < median < 1/2. ** <math>0 < \operatorname{var}(X) < \tfrac{-11+5 \sqrt{5}}{2}, </math> (maximum variance occurs for <math>\alpha=\tfrac{-1+\sqrt{5}}{2}, \beta=1</math>, or ''α'' = '''Φ''' the [[Golden ratio|golden ratio conjugate]]) *'''''α'' ≥ 1, ''β'' < 1''' **J-shaped with a left tail, **negatively skewed, **strictly increasing, [[convex function|convex]] ** mode = 1 ** 1/2 < median < 1 ** <math>0 < \operatorname{var}(X) < \tfrac{-11+5 \sqrt{5}}{2},</math> (maximum variance occurs for <math>\alpha=1, \beta=\tfrac{-1+\sqrt{5}}{2}</math>, or ''β'' = '''Φ''' the [[Golden ratio|golden ratio conjugate]]) *'''''α'' = 1, ''β'' > 1''' **positively skewed, **strictly decreasing (red plot), **a reversed (mirror-image) power function [0,1] distribution ** mean = 1 / (''β'' + 1) ** median = 1 - 1/2<sup>1/''β''</sup> ** mode = 0 **α = 1, 1 < β < 2 ***[[concave function|concave]] *** <math>1-\tfrac{1}{\sqrt{2}}< \text{median} < \tfrac{1}{2}</math> *** 1/18 < var(''X'') < 1/12. **α = 1, β = 2 ***a straight line with slope −2, the right-[[triangular distribution]] with right angle at the left end, at ''x'' = 0 *** <math>\text{median}=1-\tfrac {1}{\sqrt{2}}</math> *** var(''X'') = 1/18 **α = 1, β > 2 ***reverse J-shaped with a right tail, ***[[convex function|convex]] *** <math>0 < \text{median} < 1-\tfrac{1}{\sqrt{2}}</math> *** 0 < var(''X'') < 1/18 *'''α > 1, β = 1''' **negatively skewed, **strictly increasing (green plot), **the power function [0, 1] distribution<ref name="Handbook of Beta Distribution" /> ** mean = α / (α + 1) ** median = 1/2<sup>1/α </sup> ** mode = 1 **2 > α > 1, β = 1 ***[[concave function|concave]] *** <math>\tfrac{1}{2} < \text{median} < \tfrac{1}{\sqrt{2}}</math> *** 1/18 < var(''X'') < 1/12 ** α = 2, β = 1 ***a straight line with slope +2, the right-[[triangular distribution]] with right angle at the right end, at ''x'' = 1 *** <math>\text{median}=\tfrac {1}{\sqrt{2}}</math> *** var(''X'') = 1/18 **α > 2, β = 1 ***J-shaped with a left tail, [[convex function|convex]] ***<math>\tfrac{1}{\sqrt{2}} < \text{median} < 1</math> *** 0 < var(''X'') < 1/18
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)