Editing Likelihood function (section)

===Profile likelihood===
It is possible to reduce the dimensions by concentrating the likelihood function for a subset of parameters by expressing the nuisance parameters as functions of the parameters of interest and replacing them in the likelihood function.<ref>{{cite book |first=Takeshi |last=Amemiya |author-link=Takeshi Amemiya |title=Advanced Econometrics |chapter=Concentrated Likelihood Function |location=Cambridge |publisher=Harvard University Press |year=1985 |pages=[https://archive.org/details/advancedeconomet00amem/page/125 125–127] |isbn=978-0-674-00560-0 |chapter-url=https://books.google.com/books?id=0bzGQE14CwEC&pg=PA125 |url-access=registration |url=https://archive.org/details/advancedeconomet00amem/page/125 }}</ref><ref>{{cite book |first1=Russell |last1=Davidson |first2=James G. |last2=MacKinnon |author-link2=James G. MacKinnon |title=Estimation and Inference in Econometrics |chapter=Concentrating the Loglikelihood Function |location=New York |publisher=Oxford University Press |year=1993 |pages=267–269 |isbn=978-0-19-506011-9 }}</ref> In general, for a likelihood function depending on the parameter vector <math display="inline">\mathbf{\theta}</math> that can be partitioned into <math display="inline">\mathbf{\theta} = \left( \mathbf{\theta}_{1} : \mathbf{\theta}_{2} \right)</math>, and where a correspondence <math display="inline">\mathbf{\hat{\theta}}_{2} = \mathbf{\hat{\theta}}_{2} \left( \mathbf{\theta}_{1} \right)</math> can be determined explicitly, concentration reduces [[Computational complexity|computational burden]] of the original maximization problem.<ref>{{cite book |first1=Christian |last1=Gourieroux |first2=Alain |last2=Monfort |title=Statistics and Econometric Models |chapter=Concentrated Likelihood Function |location=New York |publisher=Cambridge University Press |year=1995 |isbn=978-0-521-40551-5 |pages=170–175 |chapter-url=https://books.google.com/books?id=gqI-pAP2JZ8C&pg=PA170 }}</ref>

For instance, in a [[linear regression]] with normally distributed errors, <math display="inline">\mathbf{y} = \mathbf{X} \beta + u</math>, the coefficient vector could be [[Partition of a set|partitioned]] into <math display="inline">\beta = \left[ \beta_{1} : \beta_{2} \right]</math> (and consequently the [[design matrix]] <math display="inline">\mathbf{X} = \left[ \mathbf{X}_{1} : \mathbf{X}_{2} \right]</math>). Maximizing with respect to <math display="inline">\beta_{2}</math> yields an optimal value function <math display="inline">\beta_{2} (\beta_{1}) = \left( \mathbf{X}_{2}^{\mathsf{T}} \mathbf{X}_{2} \right)^{-1} \mathbf{X}_{2}^{\mathsf{T}} \left( \mathbf{y} - \mathbf{X}_{1} \beta_{1} \right)</math>. Using this result, the maximum likelihood estimator for <math display="inline">\beta_{1}</math> can then be derived as
<math display="block">\hat{\beta}_{1} = \left( \mathbf{X}_{1}^{\mathsf{T}} \left( \mathbf{I} - \mathbf{P}_{2} \right) \mathbf{X}_{1} \right)^{-1} \mathbf{X}_{1}^{\mathsf{T}} \left( \mathbf{I} - \mathbf{P}_{2} \right) \mathbf{y}</math>
where <math display="inline">\mathbf{P}_{2} = \mathbf{X}_{2} \left( \mathbf{X}_{2}^{\mathsf{T}} \mathbf{X}_{2} \right)^{-1} \mathbf{X}_{2}^{\mathsf{T}}</math> is the [[projection matrix]] of <math display="inline">\mathbf{X}_{2}</math>. This result is known as the [[Frisch–Waugh–Lovell theorem]].

Since graphically the procedure of concentration is equivalent to slicing the likelihood surface along the ridge of values of the nuisance parameter <math display="inline">\beta_{2}</math> that maximizes the likelihood function, creating an [[Contour line|isometric]] [[Topographic profile|profile]] of the likelihood function for a given <math display="inline">\beta_{1}</math>, the result of this procedure is also known as ''profile likelihood''.<ref>{{citation |first=Andrew |last=Pickles |title=An Introduction to Likelihood Analysis |location=Norwich |publisher=W. H. Hutchins & Sons |year=1985 |isbn=0-86094-190-6 |pages=[https://archive.org/details/introductiontoli0000pick/page/21 21–24] |mode=cs1 |url=https://archive.org/details/introductiontoli0000pick/page/21 }}</ref><ref>{{cite book |first=Benjamin M. |last=Bolker |title=Ecological Models and Data in R |publisher=Princeton University Press |year=2008 |isbn=978-0-691-12522-0 |pages=187–189 |url=https://books.google.com/books?id=flyBd1rpqeoC&pg=PA188 }}</ref> In addition to being graphed, the profile likelihood can also be used to compute [[confidence interval]]s that often have better small-sample properties than those based on asymptotic [[Standard error (statistics)|standard errors]] calculated from the full likelihood.<ref>{{citation|last=Aitkin|first=Murray|title=GLIM 82: Proceedings of the International Conference on Generalised Linear Models|pages=76–86|year=1982|chapter=Direct Likelihood Inference|publisher=Springer|isbn=0-387-90777-7|author-link=Murray Aitkin|mode=cs1}}</ref><ref>{{citation |first1=D. J. |last1=Venzon |first2=S. H. |last2=Moolgavkar |title=A Method for Computing Profile-Likelihood-Based Confidence Intervals |journal=[[Journal of the Royal Statistical Society]] |series=Series C (Applied Statistics) |volume=37 |issue=1 |year=1988 |pages=87–94 |doi=10.2307/2347496 |jstor=2347496 |mode=cs1 }}</ref>