Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Likelihood function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Likelihoods that eliminate nuisance parameters== In many cases, the likelihood is a function of more than one parameter but interest focuses on the estimation of only one, or at most a few of them, with the others being considered as [[nuisance parameter]]s. Several alternative approaches have been developed to eliminate such nuisance parameters, so that a likelihood can be written as a function of only the parameter (or parameters) of interest: the main approaches are profile, conditional, and marginal likelihoods.<ref>{{cite book |title=In All Likelihood: Statistical Modelling and Inference Using Likelihood |first=Yudi |last=Pawitan |year=2001 |publisher= [[Oxford University Press]] }}</ref><ref>{{cite web | author = Wen Hsiang Wei |url= http://web.thu.edu.tw/wenwei/www/glmpdfmargin.htm |title = Generalized Linear Model - course notes | pages = Chapter 5 | publisher = [[Tunghai University]] | location= Taichung, Taiwan | access-date = 2017-10-01 }}</ref> These approaches are also useful when a high-dimensional likelihood surface needs to be reduced to one or two parameters of interest in order to allow a [[Graph of a function|graph]]. ===Profile likelihood=== It is possible to reduce the dimensions by concentrating the likelihood function for a subset of parameters by expressing the nuisance parameters as functions of the parameters of interest and replacing them in the likelihood function.<ref>{{cite book |first=Takeshi |last=Amemiya |author-link=Takeshi Amemiya |title=Advanced Econometrics |chapter=Concentrated Likelihood Function |location=Cambridge |publisher=Harvard University Press |year=1985 |pages=[https://archive.org/details/advancedeconomet00amem/page/125 125–127] |isbn=978-0-674-00560-0 |chapter-url=https://books.google.com/books?id=0bzGQE14CwEC&pg=PA125 |url-access=registration |url=https://archive.org/details/advancedeconomet00amem/page/125 }}</ref><ref>{{cite book |first1=Russell |last1=Davidson |first2=James G. |last2=MacKinnon |author-link2=James G. MacKinnon |title=Estimation and Inference in Econometrics |chapter=Concentrating the Loglikelihood Function |location=New York |publisher=Oxford University Press |year=1993 |pages=267–269 |isbn=978-0-19-506011-9 }}</ref> In general, for a likelihood function depending on the parameter vector <math display="inline">\mathbf{\theta}</math> that can be partitioned into <math display="inline">\mathbf{\theta} = \left( \mathbf{\theta}_{1} : \mathbf{\theta}_{2} \right)</math>, and where a correspondence <math display="inline">\mathbf{\hat{\theta}}_{2} = \mathbf{\hat{\theta}}_{2} \left( \mathbf{\theta}_{1} \right)</math> can be determined explicitly, concentration reduces [[Computational complexity|computational burden]] of the original maximization problem.<ref>{{cite book |first1=Christian |last1=Gourieroux |first2=Alain |last2=Monfort |title=Statistics and Econometric Models |chapter=Concentrated Likelihood Function |location=New York |publisher=Cambridge University Press |year=1995 |isbn=978-0-521-40551-5 |pages=170–175 |chapter-url=https://books.google.com/books?id=gqI-pAP2JZ8C&pg=PA170 }}</ref> For instance, in a [[linear regression]] with normally distributed errors, <math display="inline">\mathbf{y} = \mathbf{X} \beta + u</math>, the coefficient vector could be [[Partition of a set|partitioned]] into <math display="inline">\beta = \left[ \beta_{1} : \beta_{2} \right]</math> (and consequently the [[design matrix]] <math display="inline">\mathbf{X} = \left[ \mathbf{X}_{1} : \mathbf{X}_{2} \right]</math>). Maximizing with respect to <math display="inline">\beta_{2}</math> yields an optimal value function <math display="inline">\beta_{2} (\beta_{1}) = \left( \mathbf{X}_{2}^{\mathsf{T}} \mathbf{X}_{2} \right)^{-1} \mathbf{X}_{2}^{\mathsf{T}} \left( \mathbf{y} - \mathbf{X}_{1} \beta_{1} \right)</math>. Using this result, the maximum likelihood estimator for <math display="inline">\beta_{1}</math> can then be derived as <math display="block">\hat{\beta}_{1} = \left( \mathbf{X}_{1}^{\mathsf{T}} \left( \mathbf{I} - \mathbf{P}_{2} \right) \mathbf{X}_{1} \right)^{-1} \mathbf{X}_{1}^{\mathsf{T}} \left( \mathbf{I} - \mathbf{P}_{2} \right) \mathbf{y}</math> where <math display="inline">\mathbf{P}_{2} = \mathbf{X}_{2} \left( \mathbf{X}_{2}^{\mathsf{T}} \mathbf{X}_{2} \right)^{-1} \mathbf{X}_{2}^{\mathsf{T}}</math> is the [[projection matrix]] of <math display="inline">\mathbf{X}_{2}</math>. This result is known as the [[Frisch–Waugh–Lovell theorem]]. Since graphically the procedure of concentration is equivalent to slicing the likelihood surface along the ridge of values of the nuisance parameter <math display="inline">\beta_{2}</math> that maximizes the likelihood function, creating an [[Contour line|isometric]] [[Topographic profile|profile]] of the likelihood function for a given <math display="inline">\beta_{1}</math>, the result of this procedure is also known as ''profile likelihood''.<ref>{{citation |first=Andrew |last=Pickles |title=An Introduction to Likelihood Analysis |location=Norwich |publisher=W. H. Hutchins & Sons |year=1985 |isbn=0-86094-190-6 |pages=[https://archive.org/details/introductiontoli0000pick/page/21 21–24] |mode=cs1 |url=https://archive.org/details/introductiontoli0000pick/page/21 }}</ref><ref>{{cite book |first=Benjamin M. |last=Bolker |title=Ecological Models and Data in R |publisher=Princeton University Press |year=2008 |isbn=978-0-691-12522-0 |pages=187–189 |url=https://books.google.com/books?id=flyBd1rpqeoC&pg=PA188 }}</ref> In addition to being graphed, the profile likelihood can also be used to compute [[confidence interval]]s that often have better small-sample properties than those based on asymptotic [[Standard error (statistics)|standard errors]] calculated from the full likelihood.<ref>{{citation|last=Aitkin|first=Murray|title=GLIM 82: Proceedings of the International Conference on Generalised Linear Models|pages=76–86|year=1982|chapter=Direct Likelihood Inference|publisher=Springer|isbn=0-387-90777-7|author-link=Murray Aitkin|mode=cs1}}</ref><ref>{{citation |first1=D. J. |last1=Venzon |first2=S. H. |last2=Moolgavkar |title=A Method for Computing Profile-Likelihood-Based Confidence Intervals |journal=[[Journal of the Royal Statistical Society]] |series=Series C (Applied Statistics) |volume=37 |issue=1 |year=1988 |pages=87–94 |doi=10.2307/2347496 |jstor=2347496 |mode=cs1 }}</ref> ===Conditional likelihood=== Sometimes it is possible to find a [[sufficient statistic]] for the nuisance parameters, and conditioning on this statistic results in a likelihood which does not depend on the nuisance parameters.<ref>{{cite journal |first1=J. D. |last1=Kalbfleisch |first2=D. A. |last2=Sprott |title=Marginal and Conditional Likelihoods |journal=Sankhyā: The Indian Journal of Statistics |series=Series A |volume=35 |issue=3 |year=1973 |pages=311–328 |jstor=25049882 }}</ref> One example occurs in 2×2 tables, where conditioning on all four marginal totals leads to a conditional likelihood based on the non-central [[hypergeometric distribution]]. This form of conditioning is also the basis for [[Fisher's exact test]]. ===Marginal likelihood=== {{Main|Marginal likelihood}} Sometimes we can remove the nuisance parameters by considering a likelihood based on only part of the information in the data, for example by using the set of ranks rather than the numerical values. Another example occurs in linear [[mixed model]]s, where considering a likelihood for the residuals only after fitting the fixed effects leads to [[residual maximum likelihood]] estimation of the variance components. ===Partial likelihood=== A partial likelihood is an adaption of the full likelihood such that only a part of the parameters (the parameters of interest) occur in it.<ref> {{citation |last=Cox |first=D. R. |author-link=David Cox (statistician) |title=Partial likelihood |journal=[[Biometrika]] |year=1975 |volume=62 |issue=2 |pages=269–276 |doi=10.1093/biomet/62.2.269 |mr=0400509 |mode=cs1 }}</ref> It is a key component of the [[proportional hazards model]]: using a restriction on the hazard function, the likelihood does not contain the shape of the hazard over time.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)