Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Maximum likelihood estimation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Principles == We model a set of observations as a random [[Sample (statistics)|sample]] from an unknown [[joint probability distribution]] which is expressed in terms of a set of [[statistical parameters|parameters]]. The goal of maximum likelihood estimation is to determine the parameters for which the observed data have the highest joint probability. We write the parameters governing the joint distribution as a vector <math>\; \theta = \left[ \theta_{1},\, \theta_2,\, \ldots,\, \theta_k \right]^{\mathsf{T}} \;</math> so that this distribution falls within a [[parametric family]] <math>\; \{ f(\cdot\,;\theta) \mid \theta \in \Theta \} \;,</math> where <math>\, \Theta \,</math> is called the ''[[parameter space]]'', a finite-dimensional subset of [[Euclidean space]]. Evaluating the joint density at the observed data sample <math>\; \mathbf{y} = (y_1, y_2, \ldots, y_n) \;</math> gives a real-valued function, <math display="block">\mathcal{L}_{n}(\theta) = \mathcal{L}_{n}(\theta; \mathbf{y}) = f_{n}(\mathbf{y}; \theta) \;,</math> which is called the [[likelihood function]]. For [[Independence (probability theory)|independent random variables]], <math>f_{n}(\mathbf{y}; \theta)</math> will be the product of univariate [[Probability density function|density functions]]: <math display="block">f_{n}(\mathbf{y}; \theta) = \prod_{k=1}^n \, f_k^\mathsf{univar}(y_k; \theta) ~.</math> The goal of maximum likelihood estimation is to find the values of the model parameters that maximize the likelihood function over the parameter space,<ref name=":0">{{cite journal |last = Myung |first = I.J. |year = 2003 |title = Tutorial on maximum likelihood Estimation |journal = [[Journal of Mathematical Psychology]] |volume=47 |issue=1 |pages=90β100 |doi = 10.1016/S0022-2496(02)00028-7 }}</ref> that is: <math display="block"> \hat{\theta} = \underset{\theta\in\Theta}{\operatorname{arg\;max}}\,\mathcal{L}_{n}(\theta\,;\mathbf{y}) ~. </math> Intuitively, this selects the parameter values that make the observed data most probable. The specific value <math>~ \hat{\theta} = \hat{\theta}_{n}(\mathbf{y}) \in \Theta ~</math> that maximizes the likelihood function <math>\, \mathcal{L}_{n} \,</math> is called the maximum likelihood estimate. Further, if the function <math>\; \hat{\theta}_{n} : \mathbb{R}^{n} \to \Theta \;</math> so defined is [[measurable function|measurable]], then it is called the maximum likelihood [[estimator]]. It is generally a function defined over the [[sample space]], i.e. taking a given sample as its argument. A [[Necessity and sufficiency|sufficient but not necessary]] condition for its existence is for the likelihood function to be [[Continuous function|continuous]] over a parameter space <math>\, \Theta \,</math> that is [[Compact space|compact]].<ref>{{cite book |first1=Christian |last1=Gourieroux |first2=Alain |last2=Monfort |year=1995 |title=Statistics and Econometrics Models |publisher=Cambridge University Press |isbn=0-521-40551-3 |page=[https://archive.org/details/statisticseconom00gour_434/page/n172 161] |url=https://archive.org/details/statisticseconom00gour_434 |url-access=limited}}</ref> For an [[Open set|open]] <math>\, \Theta \,</math> the likelihood function may increase without ever reaching a supremum value. In practice, it is often convenient to work with the [[natural logarithm]] of the likelihood function, called the [[log-likelihood]]: <math display="block"> \ell(\theta\,;\mathbf{y}) = \ln \mathcal{L}_{n}(\theta\,;\mathbf{y}) ~. </math> Since the logarithm is a [[monotonic function]], the maximum of <math>\; \ell(\theta\,;\mathbf{y}) \;</math> occurs at the same value of <math>\theta</math> as does the maximum of <math>\, \mathcal{L}_{n} ~.</math><ref>{{cite book |first=Edward J. |last=Kane |year=1968 |title=Economic Statistics and Econometrics |location=New York, NY |publisher=Harper & Row |page=[https://archive.org/details/economicstatisti00kane/page/n200 179] |url=https://archive.org/details/economicstatisti00kane |url-access=registration}}</ref> If <math>\ell(\theta\,;\mathbf{y})</math> is [[Differentiable function|differentiable]] in <math>\, \Theta \,,</math> [[Derivative test|sufficient conditions]] for the occurrence of a maximum (or a minimum) are <math display="block">\frac{\partial \ell}{\partial \theta_{1}} = 0, \quad \frac{\partial \ell}{\partial \theta_{2}} = 0, \quad \ldots, \quad \frac{\partial \ell}{\partial \theta_{k}} = 0 ~,</math> known as the likelihood equations. For some models, these equations can be explicitly solved for <math>\, \widehat{\theta\,} \,,</math> but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via [[Mathematical optimization|numerical optimization]]. Another problem is that in finite samples, there may exist multiple [[Zero of a function|roots]] for the likelihood equations.<ref>{{cite book |first1=Christoper G. |last1=Small |first2=Jinfang |last2=Wang |year=2003 |chapter=Working with roots |title=Numerical Methods for Nonlinear Estimating Equations |publisher=Oxford University Press |isbn=0-19-850688-0 |pages=74β124 |chapter-url=https://books.google.com/books?id=hMrwQVllY5AC&pg=PA74 }}</ref> Whether the identified root <math>\, \widehat{\theta\,} \,</math> of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives, the so-called [[Hessian matrix]] <math display="block">\mathbf{H}\left(\widehat{\theta\,}\right) = \begin{bmatrix} \left. \frac{\partial^2 \ell}{\partial \theta_1^2} \right|_{\theta=\widehat{\theta\,}} & \left. \frac{\partial^2 \ell}{\partial \theta_1 \, \partial \theta_2} \right|_{\theta=\widehat{\theta\,}} & \dots & \left. \frac{\partial^2 \ell}{\partial \theta_1 \, \partial \theta_k} \right|_{\theta=\widehat{\theta\,}} \\ \left. \frac{\partial^2 \ell}{\partial \theta_2 \, \partial \theta_1} \right|_{\theta=\widehat{\theta\,}} & \left. \frac{\partial^2 \ell}{\partial \theta_2^2} \right|_{\theta=\widehat{\theta\,}} & \dots & \left. \frac{\partial^2 \ell}{\partial \theta_2 \, \partial \theta_k} \right|_{\theta=\widehat{\theta\,}} \\ \vdots & \vdots & \ddots & \vdots \\ \left. \frac{\partial^2 \ell}{\partial \theta_k \, \partial \theta_1} \right|_{\theta=\widehat{\theta\,}} & \left. \frac{\partial^2 \ell}{\partial \theta_k \, \partial \theta_2} \right|_{\theta=\widehat{\theta\,}} & \dots & \left. \frac{\partial^2 \ell}{\partial \theta_k^2} \right|_{\theta=\widehat{\theta\,}} \end{bmatrix} ~,</math> is [[negative semi-definite]] at <math>\widehat{\theta\,}</math>, as this indicates local [[Concave function|concavity]]. Conveniently, most common [[probability distribution]]s β in particular the [[exponential family]] β are [[Logarithmically concave function|logarithmically concave]].<ref> {{cite book |first1=Robert E. |last1=Kass |first2=Paul W. |last2=Vos |year=1997 |title=Geometrical Foundations of Asymptotic Inference |page=14 |location=New York, NY |publisher=John Wiley & Sons |isbn=0-471-82668-5 |url=https://books.google.com/books?id=e43EAIfUPCwC&pg=PA14 }} </ref><ref> {{cite web |first=Alecos |last=Papadopoulos |date=25 September 2013 |title=Why we always put log() before the joint pdf when we use MLE (Maximum likelihood Estimation)? |website=[[Stack Exchange]] |url=https://stats.stackexchange.com/q/70975 }} </ref> === Restricted parameter space === {{Distinguish|restricted maximum likelihood}} While the domain of the likelihood functionβthe [[parameter space]]βis generally a finite-dimensional subset of [[Euclidean space]], additional [[Restriction (mathematics)|restriction]]s sometimes need to be incorporated into the estimation process. The parameter space can be expressed as <math display="block">\Theta = \left\{ \theta : \theta \in \mathbb{R}^{k},\; h(\theta) = 0 \right\} ~,</math> where <math>\; h(\theta) = \left[ h_{1}(\theta), h_{2}(\theta), \ldots, h_{r}(\theta) \right] \;</math> is a [[vector-valued function]] mapping <math>\, \mathbb{R}^{k} \,</math> into <math>\; \mathbb{R}^{r} ~.</math> Estimating the true parameter <math>\theta</math> belonging to <math>\Theta</math> then, as a practical matter, means to find the maximum of the likelihood function subject to the [[Constraint (mathematics)|constraint]] <math>~h(\theta) = 0 ~.</math> Theoretically, the most natural approach to this [[constrained optimization]] problem is the method of substitution, that is "filling out" the restrictions <math>\; h_{1}, h_{2}, \ldots, h_{r} \;</math> to a set <math>\; h_{1}, h_{2}, \ldots, h_{r}, h_{r+1}, \ldots, h_{k} \;</math> in such a way that <math>\; h^{\ast} = \left[ h_{1}, h_{2}, \ldots, h_{k} \right] \;</math> is a [[one-to-one function]] from <math>\mathbb{R}^{k}</math> to itself, and reparameterize the likelihood function by setting <math>\; \phi_{i} = h_{i}(\theta_{1}, \theta_{2}, \ldots, \theta_{k}) ~.</math><ref name="Silvey p79">{{cite book |first=S. D. |last=Silvey |year=1975 |title=Statistical Inference |location=London, UK |publisher=Chapman and Hall |isbn=0-412-13820-4 |page=79 |url=https://books.google.com/books?id=qIKLejbVMf4C&pg=PA79 }}</ref> Because of the equivariance of the maximum likelihood estimator, the properties of the MLE apply to the restricted estimates also.<ref>{{cite web |first=David |last=Olive |year=2004 |title=Does the MLE maximize the likelihood? |website=Southern Illinois University |url=http://lagrange.math.siu.edu/Olive/simle.pdf }}</ref> For instance, in a [[multivariate normal distribution]] the [[covariance matrix]] <math>\, \Sigma \,</math> must be [[Positive-definite matrix|positive-definite]]; this restriction can be imposed by replacing <math>\; \Sigma = \Gamma^{\mathsf{T}} \Gamma \;,</math> where <math>\Gamma</math> is a real [[upper triangular matrix]] and <math>\Gamma^{\mathsf{T}}</math> is its [[transpose]].<ref>{{cite journal |first=Daniel P. |last=Schwallie |year=1985 |title=Positive definite maximum likelihood covariance estimators |journal=Economics Letters |volume=17 |issue=1β2 |pages=115β117 |doi=10.1016/0165-1765(85)90139-9 }}</ref> In practice, restrictions are usually imposed using the method of Lagrange which, given the constraints as defined above, leads to the ''restricted likelihood equations'' <math display="block">\frac{\partial \ell}{\partial \theta} - \frac{\partial h(\theta)^\mathsf{T}}{\partial \theta} \lambda = 0</math> and <math>h(\theta) = 0 \;,</math> where <math>~ \lambda = \left[ \lambda_{1}, \lambda_{2}, \ldots, \lambda_{r}\right]^\mathsf{T} ~</math> is a column-vector of [[Lagrange multiplier]]s and <math>\; \frac{\partial h(\theta)^\mathsf{T}}{\partial \theta} \;</math> is the {{mvar|k Γ r}} [[Jacobian matrix]] of partial derivatives.<ref name="Silvey p79"/> Naturally, if the constraints are not binding at the maximum, the Lagrange multipliers should be zero.<ref>{{cite book |first=Jan R. |last=Magnus |year=2017 |title=Introduction to the Theory of Econometrics |location=Amsterdam |publisher=VU University Press |pages=64β65 |isbn=978-90-8659-766-6}}</ref> This in turn allows for a statistical test of the "validity" of the constraint, known as the [[Lagrange multiplier test]]. === Nonparametric maximum likelihood estimation === Nonparametric maximum likelihood estimation can be performed using the [[empirical likelihood]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)