Editing Likelihood function (section)

=== Regularity conditions ===
In the context of parameter estimation, the likelihood function is usually assumed to obey certain conditions, known as regularity conditions. These conditions are {{em|assumed}} in various proofs involving likelihood functions, and need to be verified in each particular application. For maximum likelihood estimation, the existence of a global maximum of the likelihood function is of the utmost importance. By the [[extreme value theorem]], it suffices that the likelihood function is [[Continuous function|continuous]] on a [[compactness|compact]] parameter space for the maximum likelihood estimator to exist.<ref>{{cite book |first1=Christian |last1=Gouriéroux |author-link=Christian Gouriéroux |first2=Alain |last2=Monfort |year=1995 |title=Statistics and Econometric Models |location=New York |publisher=Cambridge University Press |isbn=0-521-40551-3 |page=161 |url=https://books.google.com/books?id=gqI-pAP2JZ8C&pg=PA161 }}</ref> While the continuity assumption is usually met, the compactness assumption about the parameter space is often not, as the bounds of the true parameter values might be unknown. In that case, [[Concave function|concavity]] of the likelihood function plays a key role.

More specifically, if the likelihood function is twice continuously differentiable on the <var>k</var>-dimensional parameter space <math display="inline"> \Theta </math> assumed to be an [[Open set|open]] [[Connected space|connected]] subset of <math display="inline"> \mathbb{R}^{k} \,,</math> there exists a unique maximum <math display="inline">\hat{\theta} \in \Theta</math> if the [[Hessian matrix|matrix of second partials]]
<math display="block"> \mathbf{H}(\theta) \equiv \left[\, \frac{ \partial^2 L }{\, \partial \theta_i \, \partial \theta_j \,} \,\right]_{i,j=1,1}^{n_\mathrm{i},n_\mathrm{j}} \;</math> is [[negative definite]] for every <math display="inline">\, \theta \in \Theta \,</math> at which the gradient <math display="inline">\; \nabla L \equiv \left[\, \frac{ \partial L }{\, \partial \theta_i \,} \,\right]_{i=1}^{n_\mathrm{i}} \;</math> vanishes,
and if the likelihood function approaches a constant on the [[Boundary (topology)|boundary]] of the parameter space, <math display="inline">\; \partial \Theta \;,</math> i.e.,
<math display="block"> \lim_{\theta \to \partial \Theta} L(\theta) = 0 \;,</math>
which may include the points at infinity if <math display="inline"> \, \Theta \, </math> is unbounded. Mäkeläinen and co-authors prove this result using [[Morse theory]] while informally appealing to a mountain pass property.<ref>{{cite journal |first1=Timo |last1=Mäkeläinen |first2=Klaus |last2=Schmidt |first3=George P.H. |last3=Styan |year=1981 |title=On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed-size samples |journal=[[Annals of Statistics]] |volume=9 |issue=4 |pages=758–767 |doi=10.1214/aos/1176345516 |jstor=2240844 |doi-access=free }}</ref> Mascarenhas restates their proof using the [[mountain pass theorem]].<ref>{{cite journal |first=W.F. |last=Mascarenhas |year=2011 |title=A mountain pass lemma and its implications regarding the uniqueness of constrained minimizers |journal=Optimization |volume=60 |issue=8–9 |pages=1121–1159 |doi=10.1080/02331934.2010.527973 |s2cid=15896597 }}</ref>

In the proofs of [[Consistent estimator|consistency]] and asymptotic normality of the maximum likelihood estimator, additional assumptions are made about the probability densities that form the basis of a particular likelihood function. These conditions were first established by Chanda.<ref>{{cite journal |first=K.C. |last=Chanda |year=1954 |title=A note on the consistency and maxima of the roots of likelihood equations |journal=[[Biometrika]] |volume=41 |issue=1–2 |pages=56–61 |doi=10.2307/2333005 |jstor=2333005 }}</ref> In particular, for [[almost all]] <math display="inline">x</math>, and for all <math display="inline">\, \theta \in \Theta \,,</math>
<math display="block">\frac{\partial \log f}{\partial \theta_r} \,, \quad \frac{\partial^2 \log f}{\partial \theta_r \partial \theta_s} \,, \quad \frac{\partial^3 \log f}{\partial \theta_r \, \partial \theta_s \, \partial \theta_t} \,</math>
exist for all <math display="inline">\, r, s, t = 1, 2, \ldots, k \,</math> in order to ensure the existence of a [[Taylor expansion]]. Second, for almost all <math display="inline">x</math> and for every <math display="inline">\, \theta \in \Theta \,</math> it must be that
<math display="block"> \left| \frac{\partial f}{\partial \theta_r} \right| < F_r(x) \,, \quad \left| \frac{\partial^2 f}{\partial \theta_r \, \partial \theta_s} \right| < F_{rs}(x) \,, \quad \left| \frac{\partial^3 f}{\partial \theta_r \, \partial \theta_s \, \partial \theta_t} \right| < H_{rst}(x) </math>
where <math display="inline">H</math> is such that <math display="inline">\, \int_{-\infty}^{\infty} H_{rst}(z) \mathrm{d}z \leq M < \infty \;.</math> This boundedness of the derivatives is needed to allow for [[differentiation under the integral sign]]. And lastly, it is assumed that the [[information matrix]],
<math display="block">\mathbf{I}(\theta) = \int_{-\infty}^{\infty} \frac{\partial \log f}{\partial \theta_r}\ \frac{\partial \log f}{\partial \theta_s}\ f\ \mathrm{d}z </math>
is [[positive definite]] and <math display="inline">\, \left| \mathbf{I}(\theta) \right| \,</math> is finite. This ensures that the [[Score (statistics)|score]] has a finite variance.<ref>{{cite book |first1=Edward |last1=Greenberg |first2=Charles E. Jr. |last2=Webster |title=Advanced Econometrics: A Bridge to the Literature |location=New York, NY |publisher=John Wiley & Sons |year=1983 |isbn=0-471-09077-8 |pages=24–25 }}</ref>

The above conditions are sufficient, but not necessary. That is, a model that does not meet these regularity conditions may or may not have a maximum likelihood estimator of the properties mentioned above. Further, in case of non-independently or non-identically distributed observations additional properties may need to be assumed.

In Bayesian statistics, almost identical regularity conditions are imposed on the likelihood function in order to proof asymptotic normality of the [[posterior probability]],<ref>{{cite journal |first1=C. C. |last1=Heyde |first2=I. M. |last2=Johnstone |title=On Asymptotic Posterior Normality for Stochastic Processes |journal=Journal of the Royal Statistical Society |series=Series B (Methodological) |volume=41 |issue=2 |year=1979 |pages=184–189 |doi=10.1111/j.2517-6161.1979.tb01071.x }}</ref><ref>{{cite journal |first=Chan-Fu |last=Chen |title=On Asymptotic Normality of Limiting Density Functions with Bayesian Implications |journal=Journal of the Royal Statistical Society |series=Series B (Methodological) |volume=47 |issue=3 |year=1985 |pages=540–546 |doi=10.1111/j.2517-6161.1985.tb01384.x }}</ref> and therefore to justify a [[Laplace approximation]] of the posterior in large samples.<ref>{{cite book |first1=Robert E. |last1=Kass |first2=Luke |last2=Tierney |first3=Joseph B. |last3=Kadane |chapter=The Validity of Posterior Expansions Based on Laplace's Method |editor-first=S. |editor-last=Geisser |editor2-first=J. S. |editor2-last=Hodges |editor3-first=S. J. |editor3-last=Press |editor4-first=A. |editor4-last=Zellner |pages=473–488 |publisher=Elsevier |title=Bayesian and Likelihood Methods in Statistics and Econometrics |year=1990 |isbn=0-444-88376-2 }}</ref>