Editing Beta distribution (section)

=====Four parameters=====
[[File:Fisher Information I(a,a) for alpha=beta vs range (c-a) and exponent alpha=beta - J. Rodal.png|thumb|Fisher Information ''I''(''a'',''a'') for ''α''&nbsp;=&nbsp;''β'' vs range (''c''&nbsp;−&nbsp;''a'') and exponent&nbsp;''α''&nbsp;=&nbsp;''β'']]
[[File:Fisher Information I(alpha,a) for alpha=beta, vs. range (c - a) and exponent alpha=beta - J. Rodal.png|thumb|Fisher Information ''I''(''α'',''a'') for ''α''&nbsp;=&nbsp;''β'', vs. range (''c''&nbsp;−&nbsp;''a'') and exponent ''α''&nbsp;=&nbsp;''β'']]

If ''Y''<sub>1</sub>, ..., ''Y<sub>N</sub>'' are independent random variables each having a beta distribution with four parameters: the exponents ''α'' and ''β'', and also ''a'' (the minimum of the distribution range), and ''c'' (the maximum of the distribution range) (section titled "Alternative parametrizations", "Four parameters"), with [[probability density function]]:

:<math>f(y; \alpha, \beta, a, c) = \frac{f(x;\alpha,\beta)}{c-a} =\frac{ \left (\frac{y-a}{c-a} \right )^{\alpha-1} \left (\frac{c-y}{c-a} \right)^{\beta-1} }{(c-a)B(\alpha, \beta)}=\frac{ (y-a)^{\alpha-1} (c-y)^{\beta-1} }{(c-a)^{\alpha+\beta-1}B(\alpha, \beta)}.</math>

the joint log likelihood function per ''N'' [[independent and identically distributed random variables|iid]] observations is:

:<math>\frac{1}{N} \ln(\mathcal{L} (\alpha, \beta, a, c\mid Y))= \frac{\alpha -1}{N}\sum_{i=1}^N  \ln (Y_i - a) + \frac{\beta -1}{N}\sum_{i=1}^N  \ln (c - Y_i)- \ln \Beta(\alpha,\beta) - (\alpha+\beta -1) \ln (c-a) </math>

For the four parameter case, the Fisher information has 4*4=16 components.  It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components.  Aryal and Nadarajah<ref name=Aryal/> calculated Fisher's information matrix for the four parameter case as follows:

:<math>- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha^2}=  \operatorname{var}[\ln (X)]= \psi_1(\alpha) - \psi_1(\alpha + \beta) = \mathcal{I}_{\alpha, \alpha}= \operatorname{E}\left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha^2} \right ] = \ln (\operatorname{var_{GX}}) </math>
:<math>-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta^2} = \operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta) ={\mathcal{I}}_{\beta, \beta}=  \operatorname{E} \left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta^2} \right ] = \ln(\operatorname{var_{G(1-X)}}) </math>
:<math>-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha\,\partial \beta} = \operatorname{cov}[\ln X,(1-X)]  = -\psi_1(\alpha+\beta) =\mathcal{I}_{\alpha, \beta}=  \operatorname{E} \left [- \frac{1}{N}\frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial \beta} \right ] = \ln(\operatorname{cov}_{G{X,(1-X)}})</math>

In the above expressions, the use of ''X'' instead of ''Y'' in the expressions var[ln(''X'')] = ln(var<sub>''GX''</sub>) is ''not an error''. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter ''X'' ~ Beta(''α'', ''β'') parametrization because when taking the partial derivatives with respect to the exponents (''α'', ''β'') in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum ''a'' and maximum ''c'' of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents ''α'' and ''β'' is the second derivative of the log of the beta function: ln(B(''α'', ''β'')). This term is independent of the minimum ''a'' and maximum ''c'' of the distribution's range. Double differentiation of this term results in trigamma functions.  The sections titled "Maximum likelihood", "Two unknown parameters" and "Four unknown parameters" also show this fact.

The Fisher information for ''N'' [[i.i.d.]] samples is ''N'' times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas<ref name="Cover and Thomas"/>).  (Aryal and Nadarajah<ref name=Aryal/> take a single observation, ''N'' = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per ''N'' observations. Moreover, below the erroneous expression for <math>{\mathcal{I}}_{a, a}</math> in Aryal and Nadarajah has been corrected.)

:<math>\begin{align}
\alpha > 2: \quad \operatorname{E}\left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial a^2} \right ] &= {\mathcal{I}}_{a, a}=\frac{\beta(\alpha+\beta-1)}{(\alpha-2)(c-a)^2} \\
\beta > 2: \quad \operatorname{E}\left[-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial c^2} \right ] &= \mathcal{I}_{c, c} = \frac{\alpha(\alpha+\beta-1)}{(\beta-2)(c-a)^2} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial a \, \partial c} \right ] &= {\mathcal{I}}_{a, c}  = \frac{(\alpha+\beta-1)}{(c-a)^2} \\
\alpha > 1: \quad \operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial a} \right ] &=\mathcal{I}_{\alpha, a}  = \frac{\beta}{(\alpha-1)(c-a)} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial c} \right ] &= {\mathcal{I}}_{\alpha, c} = \frac{1}{(c-a)} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta \,\partial a} \right ] &= {\mathcal{I}}_{\beta, a} = -\frac{1}{(c-a)} \\
\beta > 1: \quad \operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta \, \partial c} \right ] &= \mathcal{I}_{\beta, c}  = -\frac{\alpha}{(\beta-1)(c-a)}
\end{align}</math>

The lower two diagonal entries of the Fisher information matrix, with respect to the parameter ''a'' (the minimum of the distribution's range): <math>\mathcal{I}_{a, a}</math>, and with respect to the parameter ''c'' (the maximum of the distribution's range): <math>\mathcal{I}_{c, c}</math> are only defined for exponents ''α'' > 2 and ''β'' > 2 respectively. The Fisher information matrix component <math>\mathcal{I}_{a, a}</math> for the minimum ''a'' approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component <math>\mathcal{I}_{c, c}</math> for the maximum ''c'' approaches infinity for exponent ''β'' approaching 2 from above.

The Fisher information matrix for the four parameter case does not depend on the individual values of the minimum ''a'' and the maximum ''c'', but only on the total range (''c''&nbsp;−&nbsp;''a'').  Moreover, the components of the Fisher information matrix that depend on the range (''c''&nbsp;−&nbsp;''a''), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (''c''&nbsp;−&nbsp;''a'').

The accompanying images show the Fisher information components <math>\mathcal{I}_{a, a}</math> and <math>\mathcal{I}_{\alpha, a}</math>. Images for the Fisher information components <math>\mathcal{I}_{\alpha, \alpha}</math> and <math>\mathcal{I}_{\beta, \beta}</math> are shown in  {{section link||Geometric variance}}.  All these Fisher information components look like a basin, with the "walls" of the basin being located at low values of the parameters.

The following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: ''X'' ~ Beta(α, β) expectations of the transformed ratio ((1&nbsp;−&nbsp;''X'')/''X'') and of its mirror image (''X''/(1&nbsp;−&nbsp;''X'')), scaled by the range (''c''&nbsp;−&nbsp;''a''), which may be helpful for interpretation:

:<math>\mathcal{I}_{\alpha, a} =\frac{\operatorname{E} \left[\frac{1-X}{X} \right ]}{c-a}= \frac{\beta}{(\alpha-1)(c-a)} \text{ if }\alpha > 1</math>
:<math>\mathcal{I}_{\beta, c} = -\frac{\operatorname{E} \left [\frac{X}{1-X} \right ]}{c-a}=- \frac{\alpha}{(\beta-1)(c-a)}\text{ if }\beta> 1</math>

These are also the expected values of the "inverted beta distribution" or [[beta prime distribution]] (also known as beta distribution of the second kind or [[Pearson distribution|Pearson's Type VI]]) <ref name=JKB/> and its mirror image, scaled by the range (''c''&nbsp;−&nbsp;''a'').

Also, the following Fisher information components can be expressed in terms of the harmonic (1/X) variances or of variances based on the ratio transformed variables ((1-X)/X) as follows:

:<math>\begin{align}
\alpha > 2: \quad \mathcal{I}_{a,a} &=\operatorname{var} \left [\frac{1}{X} \right] \left (\frac{\alpha-1}{c-a} \right )^2 =\operatorname{var} \left [\frac{1-X}{X} \right ] \left (\frac{\alpha-1}{c-a} \right)^2 = \frac{\beta(\alpha+\beta-1)}{(\alpha-2)(c-a)^2} \\
\beta > 2: \quad \mathcal{I}_{c, c} &= \operatorname{var} \left [\frac{1}{1-X} \right ] \left (\frac{\beta-1}{c-a} \right )^2 = \operatorname{var} \left [\frac{X}{1-X} \right ] \left (\frac{\beta-1}{c-a} \right )^2  =\frac{\alpha(\alpha+\beta-1)}{(\beta-2)(c-a)^2}  \\
\mathcal{I}_{a, c} &=\operatorname{cov} \left [\frac{1}{X},\frac{1}{1-X} \right ]\frac{(\alpha-1)(\beta-1)}{(c-a)^2}  = \operatorname{cov} \left [\frac{1-X}{X},\frac{X}{1-X} \right ] \frac{(\alpha-1)(\beta-1)}{(c-a)^2} =\frac{(\alpha+\beta-1)}{(c-a)^2}
\end{align}</math>

See section "Moments of linearly transformed, product and inverted random variables" for these expectations.

The determinant of Fisher's information matrix is of interest (for example for the calculation of [[Jeffreys prior]] probability).  From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is:

:<math>\begin{align}
\det(\mathcal{I}(\alpha,\beta,a,c)) = {} & -\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha,\beta }+\mathcal{I}_{a,a} \mathcal{I}_{a,c} \mathcal{I}_{\alpha,c} \mathcal{I}_{\alpha ,\beta}+\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\beta}^2 -\mathcal{I}_{a,a} \mathcal{I}_{c,c} \mathcal{I}_{\alpha,\beta}^2\\
& {} -\mathcal{I}_{a,c} \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\beta,a}+\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\alpha} \mathcal{I}_{\beta,a}+2 \mathcal{I}_{c,c} \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha,\beta} \mathcal{I}_{\beta,a}\\
& {}-2\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\alpha,\beta} \mathcal{I}_{\beta ,a}+\mathcal{I}_{\alpha ,c}^2 \mathcal{I}_{\beta ,a}^2-\mathcal{I}_{c,c} \mathcal{I}_{\alpha,\alpha} \mathcal{I}_{\beta ,a}^2+\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a}^2 \mathcal{I}_{\beta ,c}\\
& {}-\mathcal{I}_{a,a} \mathcal{I}_{a,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,c}-\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha ,\beta } \mathcal{I}_{\beta ,c}+\mathcal{I}_{a,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\alpha ,\beta } \mathcal{I}_{\beta ,c}\\
& {}-\mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\beta ,a} \mathcal{I}_{\beta ,c}+\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,a} \mathcal{I}_{\beta ,c}-\mathcal{I}_{c,c} \mathcal{I}_{\alpha ,a}^2 \mathcal{I}_{\beta ,\beta }\\
& {}+2 \mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha, c} \mathcal{I}_{\beta ,\beta }-\mathcal{I}_{a,a} \mathcal{I}_{\alpha ,c}^2 \mathcal{I}_{\beta ,\beta }-\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,\beta }+\mathcal{I}_{a,a} \mathcal{I}_{c,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,\beta }\text{ if }\alpha, \beta> 2
\end{align}</math>

Using [[Sylvester's criterion]] (checking whether the diagonal elements are all positive), and since diagonal components <math>{\mathcal{I}}_{a, a}</math> and <math>{\mathcal{I}}_{c, c}</math> have [[Mathematical singularity|singularities]] at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [[Positive-definite matrix|positive-definite]] for α>2 and β>2.  Since for α > 2 and β > 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [[continuous uniform distribution|uniform distribution]] (Beta(1,1,a,c)) have Fisher information components (<math>\mathcal{I}_{a, a},\mathcal{I}_{c, c},\mathcal{I}_{\alpha, a},\mathcal{I}_{\beta, c}</math>) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case).  The four-parameter [[Wigner semicircle distribution]] (Beta(3/2,3/2,''a'',''c'')) and [[arcsine distribution]] (Beta(1/2,1/2,''a'',''c'')) have negative Fisher information determinants for the four-parameter case.