Editing Beta distribution (section)

====Method of moments====

=====Two unknown parameters=====
Two unknown parameters (<math> (\hat{\alpha}, \hat{\beta})</math>  of a beta distribution supported in the [0,1] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows.  Let:

: <math>\text{sample mean(X)}=\bar{x} = \frac{1}{N}\sum_{i=1}^N X_i</math>

be the [[sample mean]] estimate and

: <math>\text{sample variance(X)} =\bar{v} = \frac{1}{N-1}\sum_{i=1}^N (X_i - \bar{x})^2</math>

be the [[sample variance]] estimate.  The [[method of moments (statistics)|method-of-moments]] estimates of the parameters are

:<math>\hat{\alpha} = \bar{x} \left(\frac{\bar{x} (1 - \bar{x})}{\bar{v}} - 1 \right),</math> if <math>\bar{v} <\bar{x}(1 - \bar{x}),</math>
: <math>\hat{\beta} = (1-\bar{x}) \left(\frac{\bar{x} (1 - \bar{x})}{\bar{v}} - 1 \right),</math> if <math>\bar{v}<\bar{x}(1 - \bar{x}).</math>
<!--  MLE's should be in this section too.  Maybe I'll be back.... -->

When the distribution is required over a known interval other than [0, 1] with random variable ''X'', say [''a'', ''c''] with random variable ''Y'', then replace <math>\bar{x}</math> with <math>\frac{\bar{y}-a}{c-a},</math> and <math>\bar{v}</math> with <math>\frac{\bar{v_Y}}{(c-a)^2}</math> in the above couple of equations for the shape parameters (see the "Four unknown parameters" section below),<ref>{{Cite web|url=https://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm|title=1.3.6.6.17. Beta Distribution|website=www.itl.nist.gov}}</ref> where:

: <math>\text{sample mean(Y)}=\bar{y} = \frac{1}{N}\sum_{i=1}^N Y_i</math>
: <math>\text{sample variance(Y)} = \bar{v_Y} = \frac{1}{N-1}\sum_{i=1}^N (Y_i - \bar{y})^2</math>

=====Four unknown parameters=====
[[File:(alpha and beta) Parameter estimates vs. excess Kurtosis and (squared) Skewness Beta distribution - J. Rodal.png|thumb|Solutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution]]

All four parameters (<math>\hat{\alpha}, \hat{\beta}, \hat{a}, \hat{c}</math> of a beta distribution supported in the [''a'', ''c''] interval, see section [[Beta distribution#Four parameters|"Alternative parametrizations, Four parameters"]]) can be estimated, using the method of moments developed by [[Karl Pearson]], by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).<ref name=JKB/><ref name=Elderton1906/><ref name="Elderton and Johnson">{{cite book|last=Elderton|first=William Palin and Norman Lloyd Johnson|title=Systems of Frequency Curves|year=2009|publisher=Cambridge University Press|isbn=978-0521093361}}</ref> The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section [[Beta distribution#Kurtosis|"Kurtosis"]]) as follows:

:<math>\text{excess kurtosis} =\frac{6}{3 + \nu}\left(\frac{(2 + \nu)}{4} (\text{skewness})^2 - 1\right)\text{ if (skewness)}^2-2< \text{excess kurtosis}< \tfrac{3}{2} (\text{skewness})^2</math>

One can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:<ref name=Elderton1906/>

:<math>\hat{\nu} = \hat{\alpha} + \hat{\beta} = 3\frac{(\text{sample excess kurtosis})  - (\text{sample skewness})^2+2}{\frac{3}{2} (\text{sample skewness})^2 - \text{(sample excess kurtosis)}}</math>
:<math>\text{ if (sample skewness)}^2-2< \text{sample excess kurtosis}< \tfrac{3}{2} (\text{sample skewness})^2</math>

This is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson<ref name=Pearson />) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see {{section link||Kurtosis bounded by the square of the skewness}}):

The case of zero skewness, can be immediately solved because for zero skewness, α = β and hence ν = 2α = 2β, therefore α = β = ν/2

: <math>\hat{\alpha} = \hat{\beta} = \frac{\hat{\nu}}{2}= \frac{\frac{3}{2}(\text{sample excess kurtosis}) +3}{- \text{(sample excess kurtosis)}}</math>
: <math> \text{ if sample skewness}= 0 \text{ and } -2<\text{sample excess kurtosis}<0</math>

(Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that <math>\hat{\nu}</math> -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero).

For non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters <math>\hat{a}, \hat{c}</math>, the parameters <math>\hat{\alpha}, \hat{\beta}</math> can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters):

:<math>(\text{sample skewness})^2 = \frac{4(\hat{\beta}-\hat{\alpha})^2 (1 + \hat{\alpha} + \hat{\beta})}{\hat{\alpha} \hat{\beta} (2 + \hat{\alpha} + \hat{\beta})^2}</math>
:<math>\text{sample excess kurtosis} =\frac{6}{3 + \hat{\alpha} + \hat{\beta}}\left(\frac{(2 + \hat{\alpha} + \hat{\beta})}{4} (\text{sample skewness})^2 - 1\right)</math>
:<math>\text{ if (sample skewness)}^2-2< \text{sample excess kurtosis}< \tfrac{3}{2}(\text{sample skewness})^2</math>

resulting in the following solution:<ref name=Elderton1906/>

: <math>\hat{\alpha}, \hat{\beta} = \frac{\hat{\nu}}{2} \left (1 \pm \frac{1}{ \sqrt{1+ \frac{16 (\hat{\nu} + 1)}{(\hat{\nu} + 2)^2(\text{sample skewness})^2}}} \right )</math>

: <math>\text{ if sample skewness}\neq 0 \text{    and   } (\text{sample skewness})^2-2< \text{sample excess kurtosis}< \tfrac{3}{2} (\text{sample skewness})^2</math>

Where one should take the solutions as follows: <math>\hat{\alpha}>\hat{\beta}</math> for (negative) sample skewness < 0, and <math>\hat{\alpha}<\hat{\beta}</math> for (positive) sample skewness > 0.

The accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation.  The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β < 1, uniform for α = β = 1, upside-down-U-shaped for 1 < α = β < 2 and bell-shaped for α = β > 2.  The surfaces also meet at the front (lower) edge defined by "the impossible boundary" line (excess kurtosis + 2 - skewness<sup>2</sup> = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities <math>p=\tfrac{\beta}{\alpha + \beta}</math> at the left end ''x'' = 0 and <math>q = 1-p = \tfrac{\alpha}{\alpha + \beta}  </math> at the right end ''x'' = 1.  The two surfaces become further apart towards the rear edge.  At this rear edge the surface parameters are quite different from each other.  As remarked, for example, by Bowman and Shenton,<ref name="BowmanShenton">{{cite journal|last=Bowman|first=K. O.|author1-link=Kimiko O. Bowman|author2=Shenton, L. R.|title=The beta distribution, moment method, Karl Pearson and R.A. Fisher|journal=Far East J. Theo. Stat.|year=2007|volume=23|issue=2|pages=133–164| url=http://www.csm.ornl.gov/~bowman/fjts232.pdf }}</ref> sampling in the neighborhood of the line (sample excess kurtosis - (3/2)(sample skewness)<sup>2</sup> = 0) (the just-J-shaped portion of the rear edge where blue meets beige), "is dangerously near to chaos", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached.  Bowman and Shenton <ref name="BowmanShenton" /> write that "the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable." Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3/2) times the square of the skewness.  This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter.  See {{section link||Kurtosis bounded by the square of the skewness}} for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3/2)(sample skewness)<sup>2</sup> = 0).  As remarked by Karl Pearson himself <ref name=Pearson1936/> this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice).  The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem.

The remaining two parameters <math>\hat{a}, \hat{c}</math> can be determined using the sample mean and the sample variance using a variety of equations.<ref name="JKB"/><ref name=Elderton1906/>  One alternative is to calculate the support interval range <math>(\hat{c}-\hat{a})</math> based on the sample variance and the sample kurtosis.  For this purpose one can solve, in terms of the range <math>(\hat{c}- \hat{a})</math>, the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see {{section link||Kurtosis }} and {{section link||Alternative parametrizations, four parameters}}):

:<math>\text{sample excess kurtosis} =\frac{6}{(3 + \hat{\nu})(2 + \hat{\nu})}\bigg(\frac{(\hat{c}- \hat{a})^2}{\text{(sample variance)}} - 6 - 5 \hat{\nu} \bigg)</math>

to obtain:

:<math> (\hat{c}- \hat{a}) = \sqrt{\text{(sample variance)}}\sqrt{6+5\hat{\nu}+\frac{(2+\hat{\nu})(3+\hat{\nu})}{6}\text{(sample excess kurtosis)}}</math>

Another alternative is to calculate the support interval range <math>(\hat{c}-\hat{a})</math> based on the sample variance and the sample skewness.<ref name=Elderton1906/>  For this purpose one can solve, in terms of the range <math>(\hat{c}-\hat{a})</math>, the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled "Skewness" and "Alternative parametrizations, four parameters"):

:<math>(\text{sample skewness})^2 = \frac{4}{(2+\hat{\nu})^2}\bigg(\frac{(\hat{c}- \hat{a})^2}{ \text{(sample variance)}}-4(1+\hat{\nu})\bigg)</math>

to obtain:<ref name=Elderton1906/>

:<math> (\hat{c}- \hat{a}) = \frac{\sqrt{\text{(sample variance)}}}{2}\sqrt{(2+\hat{\nu})^2(\text{sample skewness})^2+16(1+\hat{\nu})}</math>

The remaining parameter can be determined from the sample mean and the previously obtained parameters: <math>(\hat{c}-\hat{a}), \hat{\alpha}, \hat{\nu} = \hat{\alpha}+\hat{\beta}</math>:

:<math>  \hat{a} = (\text{sample mean}) -  \left(\frac{\hat{\alpha}}{\hat{\nu}}\right)(\hat{c}-\hat{a}) </math>

and finally, <math>\hat{c}= (\hat{c}- \hat{a}) + \hat{a}  </math>.

In the above formulas one may take, for example, as estimates of the sample moments:

:<math>\begin{align}
\text{sample mean} &=\overline{y} = \frac{1}{N}\sum_{i=1}^N Y_i \\
\text{sample variance} &= \overline{v}_Y = \frac{1}{N-1}\sum_{i=1}^N (Y_i - \overline{y})^2 \\
\text{sample skewness} &= G_1 = \frac{N}{(N-1)(N-2)} \frac{\sum_{i=1}^N (Y_i-\overline{y})^3}{\overline{v}_Y^{\frac{3}{2}} } \\
\text{sample excess kurtosis} &= G_2 = \frac{N(N+1)}{(N-1)(N-2)(N-3)} \frac{\sum_{i=1}^N (Y_i - \overline{y})^4}{\overline{v}_Y^2} - \frac{3(N-1)^2}{(N-2)(N-3)}
\end{align}</math>

The estimators ''G''<sub>1</sub> for [[skewness|sample skewness]] and ''G''<sub>2</sub> for [[kurtosis|sample kurtosis]] are used by [[DAP (software)|DAP]]/[[SAS System|SAS]], [[PSPP]]/[[SPSS]], and [[Microsoft Excel|Excel]].  However, they are not used by [[BMDP]] and (according to <ref name="Joanes and Gill"/>) they were not used by [[MINITAB]] in 1998. Actually, Joanes and Gill in their 1998 study<ref name="Joanes and Gill">{{cite journal|last=Joanes|first=D. N.|author2=C. A. Gill|title=Comparing measures of sample skewness and kurtosis|journal=The Statistician|year=1998|volume=47|issue=Part 1|pages=183–189|doi=10.1111/1467-9884.00122}}</ref>  concluded that the skewness and kurtosis estimators used in [[BMDP]] and in [[MINITAB]] (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in  [[DAP (software)|DAP]]/[[SAS System|SAS]], [[PSPP]]/[[SPSS]], namely ''G''<sub>1</sub> and ''G''<sub>2</sub>, had smaller mean-squared error in samples from a very skewed distribution.  It is for this reason that we have spelled out "sample skewness", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill<ref name="Joanes and Gill"/>).