Editing Beta distribution (section)

====Fisher information matrix====
Let a random variable X have a probability density ''f''(''x'';''α''). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log [[likelihood function]] is called the [[score (statistics)|score]].  The second moment of the score is called the [[Fisher information]]:

:<math>\mathcal{I}(\alpha)=\operatorname{E} \left [\left (\frac{\partial}{\partial\alpha} \ln \mathcal{L}(\alpha\mid X) \right )^2 \right],</math>

The [[expected value|expectation]] of the [[score (statistics)|score]] is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the [[variance]] of the score.

If the log [[likelihood function]] is twice differentiable with respect to the parameter α, and under certain regularity conditions,<ref name=Silvey>{{cite book|last=Silvey|first=S.D.|title=Statistical Inference|year=1975|publisher=Chapman and Hal|page=40|isbn=978-0412138201}}</ref> then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes):

:<math>\mathcal{I}(\alpha) = - \operatorname{E} \left [\frac{\partial^2}{\partial\alpha^2} \ln (\mathcal{L}(\alpha\mid X)) \right].</math>

Thus, the Fisher information is the negative of the expectation of the second [[derivative]]  with respect to the parameter α of the log [[likelihood function]]. Therefore, Fisher information is a measure of the [[curvature]] of the log likelihood function of α. A low [[curvature]] (and therefore high [[Radius of curvature (mathematics)|radius of curvature]]), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large [[curvature]] (and therefore low [[Radius of curvature (mathematics)|radius of curvature]]) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters ("the observed Fisher information matrix") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.<ref name=EdwardsLikelihood>{{cite book|last=Edwards|first=A. W. F.|title=Likelihood|year=1992 |publisher=The Johns Hopkins University Press|isbn=978-0801844430}}</ref>  The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators.  The [[Cramér–Rao bound]] states that the inverse of the Fisher information is a lower bound on the variance of any [[estimator]] of a parameter α:

:<math>\operatorname{var}[\hat\alpha] \geq \frac{1}{\mathcal{I}(\alpha)}.</math>

The precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two alternative hypothesis of a parameter.<ref name=Jaynes>{{cite book|last=Jaynes|first=E.T.|title=Probability theory, the logic of science|year=2003|publisher=Cambridge University Press|isbn=978-0521592710}}</ref>

When there are ''N'' parameters

:<math> \begin{bmatrix} \theta_1 \\ \theta_2 \\ \vdots \\ \theta_N \end{bmatrix},</math>

then the Fisher information takes the form of an ''N''×''N'' [[positive semidefinite matrix|positive semidefinite]] [[symmetric matrix]], the Fisher information matrix, with typical element:

:<math> (\mathcal{I}(\theta))_{i, j} = \operatorname{E} \left [\left (\frac{\partial}{\partial\theta_i} \ln \mathcal{L} \right) \left(\frac \partial {\partial\theta_j} \ln \mathcal{L} \right) \right ].</math>

Under certain regularity conditions,<ref name=Silvey/> the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation:

:<math> (\mathcal{I}(\theta))_{i, j} = - \operatorname{E} \left [\frac{\partial^2}{\partial\theta_i \, \partial\theta_j} \ln (\mathcal{L}) \right ]\,.</math>

With ''X''<sub>1</sub>, ..., ''X<sub>N</sub>'' [[iid]] random variables, an ''N''-dimensional "box" can be constructed with sides ''X''<sub>1</sub>, ..., ''X<sub>N</sub>''. Costa and Cover<ref name=CostaCover>{{cite book|last=Costa|first=Max, and Cover, Thomas|title=On the similarity of the entropy power inequality and the Brunn Minkowski inequality|date=September 1983|publisher=Tech.Report 48, Dept. Statistics, Stanford University|url=https://isl.stanford.edu/people/cover/papers/transIT/0837cost.pdf}}</ref>  show that the (Shannon) differential entropy ''h''(''X'') is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set.

=====Two parameters=====
For ''X''<sub>1</sub>, ..., ''X''<sub>''N''</sub> independent random variables each having a beta distribution parametrized with shape parameters ''α'' and ''β'', the joint log likelihood function for ''N'' [[independent and identically distributed random variables|iid]] observations is:

: <math>\ln (\mathcal{L} (\alpha, \beta\mid X) )= (\alpha - 1)\sum_{i=1}^N \ln X_i + (\beta- 1)\sum_{i=1}^N  \ln (1-X_i)- N \ln \Beta(\alpha,\beta) </math>

therefore the joint log likelihood function per ''N'' [[independent and identically distributed random variables|iid]] observations is

:<math>\frac{1}{N} \ln(\mathcal{L} (\alpha, \beta\mid X)) = (\alpha - 1)\frac{1}{N}\sum_{i=1}^N  \ln X_i + (\beta- 1) \frac{1}{N}\sum_{i=1}^N  \ln (1-X_i)-\, \ln \Beta(\alpha,\beta). </math>

For the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal).
 	
Aryal and Nadarajah<ref name=Aryal>{{cite journal|last=Aryal|first=Gokarna|author2=Saralees Nadarajah|title=Information matrix for beta distributions|journal=Serdica Mathematical Journal (Bulgarian Academy of Science)| year=2004| volume=30|pages=513–526|url=http://www.math.bas.bg/serdica/2004/2004-513-526.pdf}}</ref> calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows:

:<math>- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N\partial \alpha^2}=  \operatorname{var}[\ln (X)]= \psi_1(\alpha) - \psi_1(\alpha + \beta) ={\mathcal{I}}_{\alpha, \alpha}= \operatorname{E}\left [- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N\partial \alpha^2} \right ] = \ln \operatorname{var}_{GX} </math>
:<math>- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N\,\partial \beta^2} = \operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta) ={\mathcal{I}}_{\beta, \beta}=  \operatorname{E}\left [- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N\partial \beta^2} \right]= \ln \operatorname{var}_{G(1-X)} </math>
:<math>- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N \, \partial \alpha \, \partial \beta} = \operatorname{cov}[\ln X,\ln(1-X)]  = -\psi_1(\alpha+\beta) ={\mathcal{I}}_{\alpha, \beta}=  \operatorname{E}\left [- \frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{N\,\partial \alpha\,\partial \beta} \right] = \ln \operatorname{cov}_{G{X,(1-X)}}</math>

Since the Fisher information matrix is symmetric

:<math> \mathcal{I}_{\alpha, \beta}= \mathcal{I}_{\beta, \alpha}= \ln \operatorname{cov}_{G{X,(1-X)}}</math>

The Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as '''[[trigamma function]]s''', denoted ψ<sub>1</sub>(α),  the second of the [[polygamma function]]s, defined as the derivative of the [[digamma]] function:

:<math>\psi_1(\alpha) = \frac{d^2\ln\Gamma(\alpha)}{\partial\alpha^2}=\, \frac{\partial \psi(\alpha)}{\partial\alpha}. </math>

These derivatives are also derived in the {{section link||Two unknown parameters}} and plots of the log likelihood function are also shown in that section.  {{section link||Geometric variance and covariance}} contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β.  {{section link||Moments of logarithmically transformed random variables}} contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components <math>\mathcal{I}_{\alpha, \alpha}, \mathcal{I}_{\beta, \beta}</math> and <math>\mathcal{I}_{\alpha, \beta}</math> are shown in {{section link||Geometric variance}}.

The determinant of Fisher's information matrix is of interest (for example for the calculation of [[Jeffreys prior]] probability).  From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is:

:<math>\begin{align}
\det(\mathcal{I}(\alpha, \beta))&= \mathcal{I}_{\alpha, \alpha} \mathcal{I}_{\beta, \beta}-\mathcal{I}_{\alpha, \beta} \mathcal{I}_{\alpha, \beta} \\[4pt]
&=(\psi_1(\alpha) - \psi_1(\alpha + \beta))(\psi_1(\beta) - \psi_1(\alpha + \beta))-( -\psi_1(\alpha+\beta))( -\psi_1(\alpha+\beta))\\[4pt]
&= \psi_1(\alpha)\psi_1(\beta)-( \psi_1(\alpha)+\psi_1(\beta))\psi_1(\alpha + \beta)\\[4pt]
\lim_{\alpha\to 0} \det(\mathcal{I}(\alpha, \beta)) &=\lim_{\beta \to  0} \det(\mathcal{I}(\alpha, \beta)) = \infty\\[4pt]
\lim_{\alpha\to \infty} \det(\mathcal{I}(\alpha, \beta)) &=\lim_{\beta \to  \infty} \det(\mathcal{I}(\alpha, \beta)) = 0
\end{align}</math>

From [[Sylvester's criterion]] (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is [[Positive-definite matrix|positive-definite]] (under the standard condition that the shape parameters are positive ''α''&nbsp;>&nbsp;0 and&nbsp;''β''&nbsp;>&nbsp;0).

=====Four parameters=====
[[File:Fisher Information I(a,a) for alpha=beta vs range (c-a) and exponent alpha=beta - J. Rodal.png|thumb|Fisher Information ''I''(''a'',''a'') for ''α''&nbsp;=&nbsp;''β'' vs range (''c''&nbsp;−&nbsp;''a'') and exponent&nbsp;''α''&nbsp;=&nbsp;''β'']]
[[File:Fisher Information I(alpha,a) for alpha=beta, vs. range (c - a) and exponent alpha=beta - J. Rodal.png|thumb|Fisher Information ''I''(''α'',''a'') for ''α''&nbsp;=&nbsp;''β'', vs. range (''c''&nbsp;−&nbsp;''a'') and exponent ''α''&nbsp;=&nbsp;''β'']]

If ''Y''<sub>1</sub>, ..., ''Y<sub>N</sub>'' are independent random variables each having a beta distribution with four parameters: the exponents ''α'' and ''β'', and also ''a'' (the minimum of the distribution range), and ''c'' (the maximum of the distribution range) (section titled "Alternative parametrizations", "Four parameters"), with [[probability density function]]:

:<math>f(y; \alpha, \beta, a, c) = \frac{f(x;\alpha,\beta)}{c-a} =\frac{ \left (\frac{y-a}{c-a} \right )^{\alpha-1} \left (\frac{c-y}{c-a} \right)^{\beta-1} }{(c-a)B(\alpha, \beta)}=\frac{ (y-a)^{\alpha-1} (c-y)^{\beta-1} }{(c-a)^{\alpha+\beta-1}B(\alpha, \beta)}.</math>

the joint log likelihood function per ''N'' [[independent and identically distributed random variables|iid]] observations is:

:<math>\frac{1}{N} \ln(\mathcal{L} (\alpha, \beta, a, c\mid Y))= \frac{\alpha -1}{N}\sum_{i=1}^N  \ln (Y_i - a) + \frac{\beta -1}{N}\sum_{i=1}^N  \ln (c - Y_i)- \ln \Beta(\alpha,\beta) - (\alpha+\beta -1) \ln (c-a) </math>

For the four parameter case, the Fisher information has 4*4=16 components.  It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components.  Aryal and Nadarajah<ref name=Aryal/> calculated Fisher's information matrix for the four parameter case as follows:

:<math>- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha^2}=  \operatorname{var}[\ln (X)]= \psi_1(\alpha) - \psi_1(\alpha + \beta) = \mathcal{I}_{\alpha, \alpha}= \operatorname{E}\left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha^2} \right ] = \ln (\operatorname{var_{GX}}) </math>
:<math>-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta^2} = \operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta) ={\mathcal{I}}_{\beta, \beta}=  \operatorname{E} \left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta^2} \right ] = \ln(\operatorname{var_{G(1-X)}}) </math>
:<math>-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha\,\partial \beta} = \operatorname{cov}[\ln X,(1-X)]  = -\psi_1(\alpha+\beta) =\mathcal{I}_{\alpha, \beta}=  \operatorname{E} \left [- \frac{1}{N}\frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial \beta} \right ] = \ln(\operatorname{cov}_{G{X,(1-X)}})</math>

In the above expressions, the use of ''X'' instead of ''Y'' in the expressions var[ln(''X'')] = ln(var<sub>''GX''</sub>) is ''not an error''. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter ''X'' ~ Beta(''α'', ''β'') parametrization because when taking the partial derivatives with respect to the exponents (''α'', ''β'') in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum ''a'' and maximum ''c'' of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents ''α'' and ''β'' is the second derivative of the log of the beta function: ln(B(''α'', ''β'')). This term is independent of the minimum ''a'' and maximum ''c'' of the distribution's range. Double differentiation of this term results in trigamma functions.  The sections titled "Maximum likelihood", "Two unknown parameters" and "Four unknown parameters" also show this fact.

The Fisher information for ''N'' [[i.i.d.]] samples is ''N'' times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas<ref name="Cover and Thomas"/>).  (Aryal and Nadarajah<ref name=Aryal/> take a single observation, ''N'' = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per ''N'' observations. Moreover, below the erroneous expression for <math>{\mathcal{I}}_{a, a}</math> in Aryal and Nadarajah has been corrected.)

:<math>\begin{align}
\alpha > 2: \quad \operatorname{E}\left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial a^2} \right ] &= {\mathcal{I}}_{a, a}=\frac{\beta(\alpha+\beta-1)}{(\alpha-2)(c-a)^2} \\
\beta > 2: \quad \operatorname{E}\left[-\frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial c^2} \right ] &= \mathcal{I}_{c, c} = \frac{\alpha(\alpha+\beta-1)}{(\beta-2)(c-a)^2} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial a \, \partial c} \right ] &= {\mathcal{I}}_{a, c}  = \frac{(\alpha+\beta-1)}{(c-a)^2} \\
\alpha > 1: \quad \operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial a} \right ] &=\mathcal{I}_{\alpha, a}  = \frac{\beta}{(\alpha-1)(c-a)} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \, \partial c} \right ] &= {\mathcal{I}}_{\alpha, c} = \frac{1}{(c-a)} \\
\operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta \,\partial a} \right ] &= {\mathcal{I}}_{\beta, a} = -\frac{1}{(c-a)} \\
\beta > 1: \quad \operatorname{E}\left[- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta \, \partial c} \right ] &= \mathcal{I}_{\beta, c}  = -\frac{\alpha}{(\beta-1)(c-a)}
\end{align}</math>

The lower two diagonal entries of the Fisher information matrix, with respect to the parameter ''a'' (the minimum of the distribution's range): <math>\mathcal{I}_{a, a}</math>, and with respect to the parameter ''c'' (the maximum of the distribution's range): <math>\mathcal{I}_{c, c}</math> are only defined for exponents ''α'' > 2 and ''β'' > 2 respectively. The Fisher information matrix component <math>\mathcal{I}_{a, a}</math> for the minimum ''a'' approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component <math>\mathcal{I}_{c, c}</math> for the maximum ''c'' approaches infinity for exponent ''β'' approaching 2 from above.

The Fisher information matrix for the four parameter case does not depend on the individual values of the minimum ''a'' and the maximum ''c'', but only on the total range (''c''&nbsp;−&nbsp;''a'').  Moreover, the components of the Fisher information matrix that depend on the range (''c''&nbsp;−&nbsp;''a''), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (''c''&nbsp;−&nbsp;''a'').

The accompanying images show the Fisher information components <math>\mathcal{I}_{a, a}</math> and <math>\mathcal{I}_{\alpha, a}</math>. Images for the Fisher information components <math>\mathcal{I}_{\alpha, \alpha}</math> and <math>\mathcal{I}_{\beta, \beta}</math> are shown in  {{section link||Geometric variance}}.  All these Fisher information components look like a basin, with the "walls" of the basin being located at low values of the parameters.

The following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: ''X'' ~ Beta(α, β) expectations of the transformed ratio ((1&nbsp;−&nbsp;''X'')/''X'') and of its mirror image (''X''/(1&nbsp;−&nbsp;''X'')), scaled by the range (''c''&nbsp;−&nbsp;''a''), which may be helpful for interpretation:

:<math>\mathcal{I}_{\alpha, a} =\frac{\operatorname{E} \left[\frac{1-X}{X} \right ]}{c-a}= \frac{\beta}{(\alpha-1)(c-a)} \text{ if }\alpha > 1</math>
:<math>\mathcal{I}_{\beta, c} = -\frac{\operatorname{E} \left [\frac{X}{1-X} \right ]}{c-a}=- \frac{\alpha}{(\beta-1)(c-a)}\text{ if }\beta> 1</math>

These are also the expected values of the "inverted beta distribution" or [[beta prime distribution]] (also known as beta distribution of the second kind or [[Pearson distribution|Pearson's Type VI]]) <ref name=JKB/> and its mirror image, scaled by the range (''c''&nbsp;−&nbsp;''a'').

Also, the following Fisher information components can be expressed in terms of the harmonic (1/X) variances or of variances based on the ratio transformed variables ((1-X)/X) as follows:

:<math>\begin{align}
\alpha > 2: \quad \mathcal{I}_{a,a} &=\operatorname{var} \left [\frac{1}{X} \right] \left (\frac{\alpha-1}{c-a} \right )^2 =\operatorname{var} \left [\frac{1-X}{X} \right ] \left (\frac{\alpha-1}{c-a} \right)^2 = \frac{\beta(\alpha+\beta-1)}{(\alpha-2)(c-a)^2} \\
\beta > 2: \quad \mathcal{I}_{c, c} &= \operatorname{var} \left [\frac{1}{1-X} \right ] \left (\frac{\beta-1}{c-a} \right )^2 = \operatorname{var} \left [\frac{X}{1-X} \right ] \left (\frac{\beta-1}{c-a} \right )^2  =\frac{\alpha(\alpha+\beta-1)}{(\beta-2)(c-a)^2}  \\
\mathcal{I}_{a, c} &=\operatorname{cov} \left [\frac{1}{X},\frac{1}{1-X} \right ]\frac{(\alpha-1)(\beta-1)}{(c-a)^2}  = \operatorname{cov} \left [\frac{1-X}{X},\frac{X}{1-X} \right ] \frac{(\alpha-1)(\beta-1)}{(c-a)^2} =\frac{(\alpha+\beta-1)}{(c-a)^2}
\end{align}</math>

See section "Moments of linearly transformed, product and inverted random variables" for these expectations.

The determinant of Fisher's information matrix is of interest (for example for the calculation of [[Jeffreys prior]] probability).  From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is:

:<math>\begin{align}
\det(\mathcal{I}(\alpha,\beta,a,c)) = {} & -\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha,\beta }+\mathcal{I}_{a,a} \mathcal{I}_{a,c} \mathcal{I}_{\alpha,c} \mathcal{I}_{\alpha ,\beta}+\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\beta}^2 -\mathcal{I}_{a,a} \mathcal{I}_{c,c} \mathcal{I}_{\alpha,\beta}^2\\
& {} -\mathcal{I}_{a,c} \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\beta,a}+\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\alpha} \mathcal{I}_{\beta,a}+2 \mathcal{I}_{c,c} \mathcal{I}_{\alpha,a} \mathcal{I}_{\alpha,\beta} \mathcal{I}_{\beta,a}\\
& {}-2\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\alpha,\beta} \mathcal{I}_{\beta ,a}+\mathcal{I}_{\alpha ,c}^2 \mathcal{I}_{\beta ,a}^2-\mathcal{I}_{c,c} \mathcal{I}_{\alpha,\alpha} \mathcal{I}_{\beta ,a}^2+\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a}^2 \mathcal{I}_{\beta ,c}\\
& {}-\mathcal{I}_{a,a} \mathcal{I}_{a,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,c}-\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha ,\beta } \mathcal{I}_{\beta ,c}+\mathcal{I}_{a,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\alpha ,\beta } \mathcal{I}_{\beta ,c}\\
& {}-\mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha ,c} \mathcal{I}_{\beta ,a} \mathcal{I}_{\beta ,c}+\mathcal{I}_{a,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,a} \mathcal{I}_{\beta ,c}-\mathcal{I}_{c,c} \mathcal{I}_{\alpha ,a}^2 \mathcal{I}_{\beta ,\beta }\\
& {}+2 \mathcal{I}_{a,c} \mathcal{I}_{\alpha ,a} \mathcal{I}_{\alpha, c} \mathcal{I}_{\beta ,\beta }-\mathcal{I}_{a,a} \mathcal{I}_{\alpha ,c}^2 \mathcal{I}_{\beta ,\beta }-\mathcal{I}_{a,c}^2 \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,\beta }+\mathcal{I}_{a,a} \mathcal{I}_{c,c} \mathcal{I}_{\alpha ,\alpha } \mathcal{I}_{\beta ,\beta }\text{ if }\alpha, \beta> 2
\end{align}</math>

Using [[Sylvester's criterion]] (checking whether the diagonal elements are all positive), and since diagonal components <math>{\mathcal{I}}_{a, a}</math> and <math>{\mathcal{I}}_{c, c}</math> have [[Mathematical singularity|singularities]] at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [[Positive-definite matrix|positive-definite]] for α>2 and β>2.  Since for α > 2 and β > 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [[continuous uniform distribution|uniform distribution]] (Beta(1,1,a,c)) have Fisher information components (<math>\mathcal{I}_{a, a},\mathcal{I}_{c, c},\mathcal{I}_{\alpha, a},\mathcal{I}_{\beta, c}</math>) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case).  The four-parameter [[Wigner semicircle distribution]] (Beta(3/2,3/2,''a'',''c'')) and [[arcsine distribution]] (Beta(1/2,1/2,''a'',''c'')) have negative Fisher information determinants for the four-parameter case.