Editing Beta distribution (section)

====Maximum likelihood====

=====Two unknown parameters=====
[[File:Max (Joint Log Likelihood per N) for Beta distribution Maxima at alpha=beta=2 - J. Rodal.png|thumb|Max (joint log likelihood/''N'') for beta distribution maxima at ''α''&nbsp;=&nbsp;''β''&nbsp;=&nbsp;2]]
[[File:Max (Joint Log Likelihood per N) for Beta distribution Maxima at alpha=beta= 0.25,0.5,1,2,4,6,8 - J. Rodal.png|thumb|Max (joint log likelihood/''N'') for Beta distribution maxima at ''α''&nbsp;=&nbsp;''β''&nbsp;&isin;&nbsp;{0.25,0.5,1,2,4,6,8}]]

As is also the case for [[maximum likelihood]] estimates for the [[gamma distribution]], the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If ''X''<sub>1</sub>, ..., ''X<sub>N</sub>'' are independent random variables each having a beta distribution, the joint log likelihood function for ''N'' [[independent and identically distributed random variables|iid]] observations is:

:<math>\begin{align}
\ln\, \mathcal{L} (\alpha, \beta\mid X) &= \sum_{i=1}^N \ln \left (\mathcal{L}_i (\alpha, \beta\mid X_i) \right )\\
&= \sum_{i=1}^N \ln \left (f(X_i;\alpha,\beta) \right ) \\
&= \sum_{i=1}^N \ln \left (\frac{X_i^{\alpha-1}(1-X_i)^{\beta-1}}{\Beta(\alpha,\beta)} \right ) \\
&= (\alpha - 1)\sum_{i=1}^N \ln (X_i) + (\beta- 1)\sum_{i=1}^N  \ln (1-X_i) - N \ln \Beta(\alpha,\beta)
\end{align}</math>

Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [[maximum likelihood]] estimator of the shape parameters:

:<math>\frac{\partial \ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \alpha} = \sum_{i=1}^N \ln X_i -N\frac{\partial \ln \Beta(\alpha,\beta)}{\partial \alpha}=0</math>
:<math>\frac{\partial \ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \beta} = \sum_{i=1}^N  \ln (1-X_i)- N\frac{\partial \ln \mathrm{B}(\alpha,\beta)}{\partial \beta}=0</math>

where:

:<math>\frac{\partial \ln \Beta(\alpha,\beta)}{\partial \alpha} = -\frac{\partial \ln \Gamma(\alpha+\beta)}{\partial \alpha}+ \frac{\partial \ln \Gamma(\alpha)}{\partial \alpha}+ \frac{\partial \ln \Gamma(\beta)}{\partial \alpha}=-\psi(\alpha + \beta) + \psi(\alpha) + 0</math>
:<math>\frac{\partial \ln \Beta(\alpha,\beta)}{\partial \beta}= - \frac{\partial \ln \Gamma(\alpha+\beta)}{\partial \beta}+ \frac{\partial \ln \Gamma(\alpha)}{\partial \beta} + \frac{\partial \ln \Gamma(\beta)}{\partial \beta}=-\psi(\alpha + \beta) + 0 + \psi(\beta)</math>

since the '''[[digamma function]]''' denoted ψ(α) is defined as the [[logarithmic derivative]] of the [[gamma function]]:<ref name=Abramowitz/>

:<math>\psi(\alpha) =\frac {\partial\ln \Gamma(\alpha)}{\partial \alpha}</math>

To ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative.  This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative

:<math>\frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \alpha^2}= -N\frac{\partial^2\ln \Beta(\alpha,\beta)}{\partial \alpha^2}<0</math>
:<math>\frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \beta^2} = -N\frac{\partial^2\ln \Beta(\alpha,\beta)}{\partial \beta^2}<0</math>

using the previous equations, this is equivalent to:

:<math>\frac{\partial^2\ln \Beta(\alpha,\beta)}{\partial \alpha^2} = \psi_1(\alpha)-\psi_1(\alpha + \beta) > 0</math>
:<math>\frac{\partial^2\ln \Beta(\alpha,\beta)}{\partial \beta^2} = \psi_1(\beta) -\psi_1(\alpha + \beta) > 0</math>

where the '''[[trigamma function]]''', denoted ''ψ''<sub>1</sub>(''α''), is the second of the [[polygamma function]]s, and is defined as the derivative of the [[digamma]] function:

:<math>\psi_1(\alpha) = \frac{\partial^2\ln\Gamma(\alpha)}{\partial \alpha^2}=\, \frac{\partial\, \psi(\alpha)}{\partial \alpha}.</math>

These conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since:

:<math>\operatorname{var}[\ln (X)] = \operatorname{E}[\ln^2 (X)] - (\operatorname{E}[\ln (X)])^2 = \psi_1(\alpha) - \psi_1(\alpha + \beta) </math>
:<math>\operatorname{var}[\ln (1-X)] = \operatorname{E}[\ln^2 (1-X)] - (\operatorname{E}[\ln (1-X)])^2 = \psi_1(\beta) - \psi_1(\alpha + \beta) </math>

Therefore, the condition of negative curvature at a maximum is equivalent to the statements:

: <math>  \operatorname{var}[\ln (X)] > 0</math>
: <math>  \operatorname{var}[\ln (1-X)] > 0</math>

Alternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following [[logarithmic derivative]]s of the [[geometric mean]]s ''G<sub>X</sub>'' and ''G<sub>(1−X)</sub>'' are positive, since:

: <math>\psi_1(\alpha) - \psi_1(\alpha + \beta) = \frac{\partial \ln G_X}{\partial \alpha} > 0</math>
: <math>\psi_1(\beta)  - \psi_1(\alpha + \beta) = \frac{\partial \ln G_{(1-X)}}{\partial \beta} > 0</math>

While these slopes are indeed positive, the other slopes are negative:

:<math>\frac{\partial\, \ln G_X}{\partial \beta}, \frac{\partial \ln G_{(1-X)}}{\partial \alpha} < 0.</math>

The slopes of the mean and the median with respect to ''α'' and ''β'' display similar sign behavior.

From the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled [[maximum likelihood estimate]] equations (for the average log-likelihoods) that needs to be inverted to obtain the  (unknown) shape parameter estimates <math>\hat{\alpha},\hat{\beta}</math> in terms of the (known) average of logarithms of the samples ''X''<sub>1</sub>, ..., ''X<sub>N</sub>'':<ref name=JKB />

:<math>\begin{align}
\hat{\operatorname{E}}[\ln (X)] &= \psi(\hat{\alpha}) - \psi(\hat{\alpha} + \hat{\beta})=\frac{1}{N}\sum_{i=1}^N \ln X_i =  \ln \hat{G}_X \\
\hat{\operatorname{E}}[\ln(1-X)] &= \psi(\hat{\beta}) - \psi(\hat{\alpha} + \hat{\beta})=\frac{1}{N}\sum_{i=1}^N \ln (1-X_i)= \ln \hat{G}_{(1-X)}
\end{align}</math>

where we recognize <math>\log \hat{G}_X</math> as the logarithm of the sample [[geometric mean]] and <math>\log \hat{G}_{(1-X)}</math> as the logarithm of the sample [[geometric mean]] based on (1&nbsp;−&nbsp;''X''), the mirror-image of&nbsp;''X''. For <math>\hat{\alpha}=\hat{\beta}</math>, it follows that  <math>\hat{G}_X=\hat{G}_{(1-X)} </math>.

:<math>\begin{align}
\hat{G}_X &= \prod_{i=1}^N (X_i)^{1/N} \\
\hat{G}_{(1-X)} &= \prod_{i=1}^N (1-X_i)^{1/N}
\end{align}</math>

These coupled equations containing [[digamma function]]s of the shape parameter estimates <math>\hat{\alpha},\hat{\beta}</math> must be solved by numerical methods as done, for example, by Beckman et al.<ref>{{cite journal|last=Beckman|first=R. J.|author2=G. L. Tietjen|title=Maximum likelihood estimation for the beta distribution|journal=Journal of Statistical Computation and Simulation|year=1978|volume=7|issue=3–4|pages=253–258|doi=10.1080/00949657808810232}}</ref> Gnanadesikan et al. give numerical solutions for a few cases.<ref>{{cite journal |last=Gnanadesikan |first=R., Pinkham and Hughes|title=Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics |journal=Technometrics |year=1967|volume=9|issue=4|pages=607–620 |doi=10.2307/1266199|jstor=1266199}}</ref> [[Norman Lloyd Johnson|N.L.Johnson]] and [[Samuel Kotz|S.Kotz]]<ref name=JKB /> suggest that for "not too small" shape parameter estimates <math>\hat{\alpha},\hat{\beta}</math>, the logarithmic approximation to the digamma function <math>\psi(\hat{\alpha}) \approx \ln(\hat{\alpha}-\tfrac{1}{2})</math> may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly:

:<math>\ln \frac{\hat{\alpha} - \frac{1}{2}}{\hat{\alpha} + \hat{\beta} - \frac{1}{2}}  \approx  \ln \hat{G}_X </math>
:<math>\ln \frac{\hat{\beta} - \frac{1}{2}}{\hat{\alpha} + \hat{\beta} - \frac{1}{2}}\approx \ln \hat{G}_{(1-X)} </math>

which leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution:

:<math>\hat{\alpha}\approx \tfrac{1}{2} + \frac{\hat{G}_{X}}{2(1-\hat{G}_X-\hat{G}_{(1-X)})} \text{ if } \hat{\alpha} >1</math>
:<math>\hat{\beta}\approx \tfrac{1}{2} + \frac{\hat{G}_{(1-X)}}{2(1-\hat{G}_X-\hat{G}_{(1-X)})} \text{ if } \hat{\beta} > 1</math>

Alternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions.

When the distribution is required over a known interval other than [0, 1]  with random variable ''X'', say [''a'', ''c''] with random variable ''Y'', then replace ln(''X<sub>i</sub>'') in the first equation with

:<math>\ln \frac{Y_i-a}{c-a}</math>,

and replace ln(1−''X<sub>i</sub>'') in the second equation with

:<math>\ln \frac{c-Y_i}{c-a}</math>

(see "Alternative parametrizations, four parameters" section below).

If one of the shape parameters is known, the problem is considerably simplified.  The following [[logit]] transformation can be used to solve for the unknown shape parameter (for skewed cases such that <math>\hat{\alpha}\neq\hat{\beta}</math>, otherwise, if symmetric, both -equal- parameters are known when one is known):

:<math>\hat{\operatorname{E}} \left[\ln \left(\frac{X}{1-X} \right) \right]=\psi(\hat{\alpha}) - \psi(\hat{\beta})=\frac{1}{N}\sum_{i=1}^N \ln\frac{X_i}{1-X_i} =  \ln \hat{G}_X - \ln \left(\hat{G}_{(1-X)}\right) </math>

This [[logit]] transformation is the logarithm of the transformation that divides the variable ''X'' by its mirror-image (''X''/(1 - ''X'') resulting in the "inverted beta distribution"  or [[beta prime distribution]] (also known as beta distribution of the second kind or [[Pearson distribution|Pearson's Type VI]]) with support [0, +∞). As previously discussed in the section "Moments of logarithmically transformed random variables," the [[logit]] transformation <math>\ln\frac{X}{1-X}</math>, studied by Johnson,<ref name=JohnsonLogInv/> extends the finite support [0, 1] based on the original variable ''X'' to infinite support in both directions of the real line (−∞, +∞).

If, for example, <math>\hat{\beta}</math> is known, the unknown parameter <math>\hat{\alpha}</math> can be obtained in terms of the inverse<ref name=invpsi.m>{{cite web|last=Fackler |first=Paul|title=Inverse Digamma Function (Matlab)|url=http://hips.seas.harvard.edu/content/inverse-digamma-function-matlab|publisher=Harvard University School of Engineering and Applied Sciences|access-date=2012-08-18}}</ref> digamma function of the right hand side of this equation:

:<math>\psi(\hat{\alpha})=\frac{1}{N}\sum_{i=1}^N \ln\frac{X_i}{1-X_i} + \psi(\hat{\beta}) </math>
:<math>\hat{\alpha}=\psi^{-1}(\ln \hat{G}_X - \ln \hat{G}_{(1-X)} + \psi(\hat{\beta})) </math>

In particular, if one of the shape parameters has a value of unity, for example for <math>\hat{\beta} = 1</math> (the power function distribution with bounded support [0,1]), using the identity ψ(''x'' + 1) = ψ(''x'') + 1/''x'' in the equation <math>\psi(\hat{\alpha}) - \psi(\hat{\alpha} + \hat{\beta})= \ln \hat{G}_X</math>, the maximum likelihood estimator for the unknown parameter <math>\hat{\alpha}</math> is,<ref name=JKB /> exactly:

:<math>\hat{\alpha}= - \frac{1}{\frac{1}{N}\sum_{i=1}^N \ln X_i}= - \frac{1}{ \ln \hat{G}_X} </math>

The beta has support [0, 1], therefore <math>\hat{G}_X < 1</math>, and hence <math>(-\ln \hat{G}_X) >0</math>, and therefore <math>\hat{\alpha} >0.</math>

In conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample [[geometric mean]], and of the sample [[geometric mean]] based on ''(1−X)'', the mirror-image of ''X''.  One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice?  The answer is because the mean does not provide as much information as the geometric mean.  For a beta distribution with equal shape parameters ''α''&nbsp;=&nbsp;''β'', the mean is exactly 1/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance).  On the other hand, the geometric mean of a beta distribution with equal shape parameters ''α''&nbsp;=&nbsp;''β'', depends on the value of the shape parameters, and therefore it contains more information.  Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on ''X'' and geometric mean based on (1&nbsp;−&nbsp;''X''), the maximum likelihood method is able to provide best estimates for both parameters ''α''&nbsp;=&nbsp;''β'', without need of employing the variance.

One can express the joint log likelihood per ''N'' [[independent and identically distributed random variables|iid]] observations in terms of the ''[[sufficient statistic]]s'' (the sample geometric means) as follows:

:<math>\frac{\ln \mathcal{L} (\alpha, \beta\mid X)}{N} = (\alpha - 1)\ln \hat{G}_X + (\beta- 1)\ln \hat{G}_{(1-X)}- \ln \Beta(\alpha,\beta).</math>

We can plot the joint log likelihood per ''N'' observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators <math>\hat{\alpha},\hat{\beta}</math> correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution).  It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks.  Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators.  One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances

:<math>\frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \alpha^2}= -\operatorname{var}[\ln X]</math>
:<math>\frac{\partial^2\ln \mathcal{L}(\alpha,\beta\mid X)}{\partial \beta^2} = -\operatorname{var}[\ln (1-X)]</math>

These variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β > 1, the variances (and therefore the curvatures) flatten out.  Equivalently, this result follows from the [[Cramér–Rao bound]], since the [[Fisher information]] matrix components for the beta distribution are these logarithmic variances. The [[Cramér–Rao bound]] states that the [[variance]] of any ''unbiased'' estimator <math>\hat{\alpha}</math> of α is bounded by the [[multiplicative inverse|reciprocal]] of the [[Fisher information]]:

:<math>\mathrm{var}(\hat{\alpha})\geq\frac{1}{\operatorname{var}[\ln X]}\geq\frac{1}{\psi_1(\hat{\alpha}) - \psi_1(\hat{\alpha} + \hat{\beta})}</math>
:<math>\mathrm{var}(\hat{\beta}) \geq\frac{1}{\operatorname{var}[\ln (1-X)]}\geq\frac{1}{\psi_1(\hat{\beta}) - \psi_1(\hat{\alpha} + \hat{\beta})}</math>

so the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease.

Also one can express the joint log likelihood per ''N'' [[independent and identically distributed random variables|iid]] observations in terms of the [[digamma function]] expressions for the logarithms of the sample geometric means as follows:

:<math>\frac{\ln\, \mathcal{L} (\alpha, \beta\mid X)}{N} = (\alpha - 1)(\psi(\hat{\alpha}) - \psi(\hat{\alpha} + \hat{\beta}))+(\beta- 1)(\psi(\hat{\beta}) - \psi(\hat{\alpha} + \hat{\beta}))- \ln \Beta(\alpha,\beta)</math>

this expression is identical to the negative of the cross-entropy (see section on "Quantities of information (entropy)").  Therefore, finding the maximum of the joint log likelihood of the shape parameters, per ''N'' [[independent and identically distributed random variables|iid]] observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters.

:<math>\frac{\ln\, \mathcal{L} (\alpha, \beta\mid X)}{N} = - H = -h - D_{\mathrm{KL}} = -\ln\Beta(\alpha,\beta)+(\alpha-1)\psi(\hat{\alpha})+(\beta-1)\psi(\hat{\beta})-(\alpha+\beta-2)\psi(\hat{\alpha}+\hat{\beta})</math>

with the cross-entropy defined as follows:

:<math>H = \int_{0}^1 - f(X;\hat{\alpha},\hat{\beta}) \ln (f(X;\alpha,\beta)) \, {\rm d}X </math>

=====Four unknown parameters=====
The procedure is similar to the one followed in the two unknown parameter case. If ''Y''<sub>1</sub>, ..., ''Y<sub>N</sub>'' are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for ''N'' [[independent and identically distributed random variables|iid]] observations is:

:<math>\begin{align}
\ln\, \mathcal{L} (\alpha, \beta, a, c\mid Y) &= \sum_{i=1}^N \ln\,\mathcal{L}_i (\alpha, \beta, a, c\mid Y_i)\\
&= \sum_{i=1}^N \ln\,f(Y_i; \alpha, \beta, a, c) \\
&= \sum_{i=1}^N \ln\,\frac{(Y_i-a)^{\alpha-1} (c-Y_i)^{\beta-1} }{(c-a)^{\alpha+\beta-1}\Beta(\alpha, \beta)}\\
&= (\alpha - 1)\sum_{i=1}^N  \ln (Y_i - a) + (\beta- 1)\sum_{i=1}^N  \ln (c - Y_i)- N \ln \Beta(\alpha,\beta) - N (\alpha+\beta - 1) \ln (c - a)
\end{align}</math>

Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [[maximum likelihood]] estimator of the shape parameters:

:<math>\frac{\partial \ln \mathcal{L} (\alpha, \beta, a, c\mid Y) }{\partial \alpha}= \sum_{i=1}^N  \ln (Y_i - a) - N(-\psi(\alpha + \beta) + \psi(\alpha))- N \ln (c - a)= 0</math>
:<math>\frac{\partial \ln \mathcal{L} (\alpha, \beta, a, c\mid Y) }{\partial \beta} = \sum_{i=1}^N  \ln (c - Y_i) - N(-\psi(\alpha + \beta)  + \psi(\beta))- N \ln (c - a)= 0</math>
:<math>\frac{\partial \ln \mathcal{L} (\alpha, \beta, a, c\mid Y) }{\partial a} = -(\alpha - 1) \sum_{i=1}^N  \frac{1}{Y_i - a} \,+ N (\alpha+\beta - 1)\frac{1}{c - a}= 0</math>
:<math>\frac{\partial \ln \mathcal{L} (\alpha, \beta, a, c\mid Y) }{\partial c} = (\beta- 1) \sum_{i=1}^N  \frac{1}{c - Y_i} \,- N (\alpha+\beta - 1) \frac{1}{c - a} = 0</math>

these equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters <math>\hat{\alpha}, \hat{\beta}, \hat{a}, \hat{c}</math>:

:<math>\frac{1}{N}\sum_{i=1}^N  \ln \frac{Y_i - \hat{a}}{\hat{c}-\hat{a}} = \psi(\hat{\alpha})-\psi(\hat{\alpha} +\hat{\beta} )=  \ln \hat{G}_X</math>
:<math>\frac{1}{N}\sum_{i=1}^N  \ln \frac{\hat{c} - Y_i}{\hat{c}-\hat{a}} =  \psi(\hat{\beta})-\psi(\hat{\alpha} + \hat{\beta})=  \ln \hat{G}_{1-X}</math>
:<math>\frac{1}{\frac{1}{N}\sum_{i=1}^N  \frac{\hat{c} - \hat{a}}{Y_i - \hat{a}}} = \frac{\hat{\alpha} - 1}{\hat{\alpha}+\hat{\beta} - 1}=  \hat{H}_X</math>
:<math>\frac{1}{\frac{1}{N}\sum_{i=1}^N  \frac{\hat{c} - \hat{a}}{\hat{c} - Y_i}} = \frac{\hat{\beta}- 1}{\hat{\alpha}+\hat{\beta} - 1} =  \hat{H}_{1-X}</math>

with sample geometric means:

:<math>\hat{G}_X = \prod_{i=1}^{N} \left (\frac{Y_i - \hat{a}}{\hat{c}-\hat{a}} \right )^{\frac{1}{N}}</math>
:<math>\hat{G}_{(1-X)} = \prod_{i=1}^{N} \left (\frac{\hat{c} - Y_i}{\hat{c}-\hat{a}} \right )^{\frac{1}{N}}</math>

The parameters <math>\hat{a}, \hat{c}</math> are embedded inside the geometric mean expressions in a nonlinear way (to the power 1/''N'').  This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes.  One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case.  Furthermore, the expressions for the harmonic means are well-defined only for <math>\hat{\alpha}, \hat{\beta} > 1</math>, which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is [[Positive-definite matrix|positive-definite]] only for α, β > 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have [[mathematical singularity|singularities]] at the following values:

:<math>\alpha = 2: \quad \operatorname{E} \left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial a^2} \right ]= {\mathcal{I}}_{a, a}</math>
:<math>\beta = 2: \quad \operatorname{E}\left [- \frac{1}{N} \frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial c^2} \right ] = {\mathcal{I}}_{c, c}</math>
:<math>\alpha = 2: \quad \operatorname{E}\left [- \frac{1}{N}\frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \alpha \partial a}\right ] = {\mathcal{I}}_{\alpha, a} </math>
:<math>\beta = 1: \quad \operatorname{E}\left [- \frac{1}{N}\frac{\partial^2\ln \mathcal{L} (\alpha, \beta, a, c\mid Y)}{\partial \beta \partial c} \right ] = {\mathcal{I}}_{\beta, c}  </math>

(for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the [[continuous uniform distribution|uniform distribution]] (Beta(1, 1, ''a'', ''c'')), and the [[arcsine distribution]] (Beta(1/2, 1/2, ''a'', ''c'')).  [[Norman Lloyd Johnson|N.L.Johnson]] and [[Samuel Kotz|S.Kotz]]<ref name=JKB /> ignore the equations for the harmonic means and instead suggest "If a and c are unknown, and maximum likelihood estimators of ''a'', ''c'', α and β are required, the above procedure (for the two unknown parameter case, with ''X'' transformed as ''X'' = (''Y''&nbsp;−&nbsp;''a'')/(''c''&nbsp;−&nbsp;''a'')) can be repeated using a succession of trial values of ''a'' and ''c'', until the pair (''a'', ''c'') for which maximum likelihood (given ''a'' and ''c'') is as great as possible, is attained" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation).