Editing Likelihood-ratio test

{{Short description|Statistical test that compares goodness of fit}}
{{About|the statistical test that compares goodness of fit|a general description of the likelihood ratio|Likelihood ratio|the use of likelihood ratios in interpreting diagnostic tests|Likelihood ratios in diagnostic testing}}

In [[statistics]], the '''likelihood-ratio test''' is a [[hypothesis test]] that involves comparing the [[goodness of fit]] of two competing [[statistical model]]s, typically one found by [[Mathematical optimization|maximization]] over the entire [[parameter space]] and another found after imposing some [[Constraint (mathematics)|constraint]], based on the ratio of their [[likelihood function|likelihoods]]. If the more constrained model (i.e., the [[null hypothesis]]) is supported by the [[Realization (probability)|observed data]], the two likelihoods should not differ by more than [[sampling error]].<ref>{{cite book |first=Gary |last=King |author-link=Gary King (political scientist) |title=Unifying Political Methodology : The Likelihood Theory of Statistical Inference |location=New York |publisher=Cambridge University Press |year=1989 |isbn=0-521-36697-6 |page=84 |url=https://books.google.com/books?id=cligOwrd7XoC&pg=PA84 }}</ref> Thus the likelihood-ratio test tests whether this ratio is [[Statistical significance|significantly different]] from one, or equivalently whether its [[natural logarithm]] is significantly different from zero.

The likelihood-ratio test, also known as '''Wilks test''',<ref>{{cite book |first1=Bing |last1=Li |first2=G. Jogesh |last2=Babu |title=A Graduate Course on Statistical Inference |location= |publisher=Springer |year=2019 |page=331 |isbn=978-1-4939-9759-6 }}</ref> is the oldest of the three classical approaches to hypothesis testing, together with the [[Lagrange multiplier test]] and the [[Wald test]].<ref>{{cite book |first1=G. S. |last1=Maddala |author-link=G. S. Maddala |first2=Kajal |last2=Lahiri |title=Introduction to Econometrics |location=New York |publisher=Wiley |edition=Fourth |year=2010 |page=200 }}</ref> In fact, the latter two can be conceptualized as approximations to the likelihood-ratio test, and are asymptotically equivalent.<ref>{{cite journal |first=A. |last=Buse |title=The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note |journal=[[The American Statistician]] |volume=36 |issue=3a |year=1982 |pages=153–157 |doi=10.1080/00031305.1982.10482817 }}</ref><ref>{{cite book |first=Andrew |last=Pickles |title=An Introduction to Likelihood Analysis |location=Norwich |publisher=W. H. Hutchins & Sons |year=1985 |isbn=0-86094-190-6 |pages=[https://archive.org/details/introductiontoli0000pick/page/24 24–27] |url=https://archive.org/details/introductiontoli0000pick/page/24 }}</ref><ref>{{cite book |first=Thomas A. |last=Severini |title=Likelihood Methods in Statistics |location=New York |publisher=Oxford University Press |year=2000 |isbn=0-19-850650-3 |pages=120–121 }}</ref> In the case of comparing two models each of which has no unknown [[statistical parameters|parameters]], use of the likelihood-ratio test can be justified by the [[Neyman–Pearson lemma]]. The lemma demonstrates that the test has the highest [[statistical power|power]] among all competitors.<ref name="NeymanPearson1933">{{citation | last1 = Neyman | first1 = J. | author-link1 = Jerzy Neyman| last2 = Pearson | first2 = E. S. | author-link2 = Egon Pearson| doi = 10.1098/rsta.1933.0009 | title = On the problem of the most efficient tests of statistical hypotheses | journal = [[Philosophical Transactions of the Royal Society of London A]] | volume = 231 | issue = 694–706 | pages = 289–337 | year = 1933 | jstor = 91247 |bibcode = 1933RSPTA.231..289N | url = http://www.stats.org.uk/statistical-inference/NeymanPearson1933.pdf | doi-access = free }}</ref>

==Definition==
===General===
Suppose that we have a [[statistical model]] with [[Statistical parameter|parameter space]] <math>\Theta</math>. A [[null hypothesis]] is often stated by saying that the parameter <math>\theta</math> lies in a specified subset <math>\Theta_0</math> of <math>\Theta</math>. The [[alternative hypothesis]] is thus that <math>\theta</math> lies in the [[Complement (set theory)|complement]] of <math>\Theta_0</math>, i.e. in <math>\Theta ~ \backslash ~ \Theta_0</math>, which is denoted by <math>\Theta_0^\text{c}</math>. The likelihood ratio test statistic for the null hypothesis <math>H_0 \, : \, \theta \in \Theta_0</math> is given by:<ref>{{cite book |first=Karl-Rudolf |last=Koch |author-link=Karl-Rudolf Koch |title=Parameter Estimation and Hypothesis Testing in Linear Models |url=https://archive.org/details/parameterestimat0000koch |url-access=registration |location=New York |publisher=Springer |year=1988 |isbn=0-387-18840-1 |page=[https://archive.org/details/parameterestimat0000koch/page/306 306]}}</ref>

:<math>\lambda_\text{LR} = -2 \ln \left[ \frac{~ \sup_{\theta \in \Theta_0} \mathcal{L}(\theta) ~}{~ \sup_{\theta \in \Theta} \mathcal{L}(\theta) ~} \right]</math>

where the quantity inside the brackets is called the likelihood ratio. Here, the <math>\sup</math> notation refers to the [[supremum]]. As all likelihoods are positive, and as the constrained maximum cannot exceed the unconstrained maximum, the likelihood ratio is [[Bounded set|bounded]] between zero and one.

Often the likelihood-ratio test statistic is expressed as a difference between the [[log-likelihood]]s
:<math>\lambda_\text{LR} = -2 \left[~ \ell( \theta_0 ) - \ell( \hat{\theta} ) ~\right]</math>
where 
: <math>\ell( \hat{\theta} ) \equiv \ln \left[~ \sup_{\theta \in \Theta} \mathcal{L}(\theta) ~\right]~</math>
is the logarithm of the maximized likelihood function <math>\mathcal{L}</math>, and <math>\ell(\theta_0)</math> is the maximal value in the special case that the null hypothesis is true (but not necessarily a value that maximizes <math>\mathcal{L}</math> for the sampled data) and
:<math> \theta_0 \in \Theta_0 \qquad \text{ and } \qquad \hat{\theta} \in \Theta~</math>
denote the respective [[arg max|arguments of the maxima]] and the allowed ranges they're embedded in. Multiplying by &minus;2 ensures mathematically that (by [[Wilks' theorem]]) <math>\lambda_\text{LR}</math> converges asymptotically to being [[chi-squared distribution|{{mvar|χ}}²-distributed]] if the null hypothesis happens to be true.<ref>{{cite book |first=S.D. |last=Silvey |title=Statistical Inference |location=London |publisher=Chapman & Hall |year=1970 |pages=112–114 |isbn=0-412-13820-4}}</ref> The [[Sampling distribution|finite-sample distribution]]s of likelihood-ratio statistics are generally unknown.<ref>{{cite book |first1=Ron C. |last1=Mittelhammer |author-link=Ron C. Mittelhammer |first2=George G. |last2=Judge |author-link2=George Judge |first3=Douglas J. |last3=Miller |title=Econometric Foundations |location=New York |publisher=Cambridge University Press |year=2000 |isbn=0-521-62394-4 |page=66}}</ref>

The likelihood-ratio test requires that the models be [[Statistical model#Nested models|nested]] &ndash; i.e. the more complex model can be transformed into the simpler model by imposing constraints on the former's parameters. Many common test statistics are tests for nested models and can be phrased as log-likelihood ratios or approximations thereof: e.g. the [[Z-test|''Z''-test]], the [[F-test|''F''-test]], the [[G-test|''G''-test]], and [[Pearson's chi-squared test]]; for an illustration with the [[Student's t-test#One-sample t-test|one-sample ''t''-test]], see below.

If the models are not nested, then instead of the likelihood-ratio test, there is a generalization of the test that can usually be used: for details, see ''[[relative likelihood]]''.

===Case of simple hypotheses===
{{Main|Neyman–Pearson lemma}}
A simple-vs.-simple hypothesis test has completely specified models under both the null hypothesis and the alternative hypothesis, which for convenience are written in terms of fixed values of a notional parameter <math>\theta</math>:

:<math>
\begin{align}
H_0 &:& \theta=\theta_0 ,\\
H_1 &:& \theta=\theta_1 .
\end{align}
</math>
In this case, under either hypothesis, the distribution of the data is fully specified: there are no unknown parameters to estimate. For this case, a variant of the likelihood-ratio test is available:<ref>{{cite book |last1=Mood |first1=A.M. |last2=Graybill |first2=F.A. |first3=D.C. |last3=Boes |year=1974 |title=Introduction to the Theory of Statistics |edition=3rd |publisher=[[McGraw-Hill]] |at=§9.2}}</ref><ref name="Stuart et al. 20.10–20.13">{{citation |last1=Stuart|first1=A. |last2=Ord |first2=K. |last3=Arnold |first3=S. |year=1999 |title=Kendall's Advanced Theory of Statistics |volume=2A |publisher=[[Edward Arnold (publisher)|Arnold]] |at=§§20.10–20.13}}</ref>

:<math>
\Lambda(x) = \frac{~\mathcal{L}(\theta_0\mid x) ~}{~\mathcal{L}(\theta_1\mid x) ~}.
</math>

Some older references may use the reciprocal of the function above as the definition.<ref>{{citation |author1-last=Cox |author1-first=D. R.  |author1-link= David Cox (statistician)|author2-last=Hinkley |author2-first=D. V. | author2-link= David Hinkley |title=Theoretical Statistics |publisher= [[Chapman & Hall]] |year=1974 |isbn=0-412-12420-3 |page=92 }}</ref> Thus, the likelihood ratio is small if the alternative model is better than the null model.

The likelihood-ratio test provides the decision rule as follows:

:If <math>~\Lambda > c ~</math>, do not reject <math>H_0</math>;
:If <math>~\Lambda < c ~</math>, reject <math>H_0</math>;
:If <math>~\Lambda = c ~</math>, reject <math>H_0</math> with probability <math>~q~</math>.
:
The values <math>c</math> and <math>q</math> are usually chosen to obtain a specified [[significance level]] <math>\alpha</math>, via the relation
:<math>~q~</math> <math> \operatorname{P}(\Lambda=c \mid H_0)~+~\operatorname{P}(\Lambda < c \mid H_0)~=~\alpha~. </math>
The [[Neyman–Pearson lemma]] states that this likelihood-ratio test is the [[Statistical power|most powerful]] among all level <math>\alpha</math> tests for this case.<ref name="NeymanPearson1933"/><ref name="Stuart et al. 20.10–20.13"/>

==Interpretation==
The likelihood ratio is a function of the data <math>x</math>; therefore, it is a [[statistic]], although unusual in that the statistic's value depends on a parameter, <math>\theta</math>. The likelihood-ratio test rejects the null hypothesis if the value of this statistic is too small. How small is too small depends on the significance level of the test, i.e. on what probability of [[Type&nbsp;I error]] is considered tolerable (Type&nbsp;I errors consist of the rejection of a null hypothesis that is true).

The [[numerator]] corresponds to the likelihood of an observed outcome under the [[null hypothesis]]. The [[denominator]] corresponds to the maximum likelihood of an observed outcome, varying parameters over the whole parameter space. The numerator of this ratio is less than the denominator; so, the likelihood ratio is between 0 and 1. Low values of the likelihood ratio mean that the observed result was much less likely to occur under the null hypothesis as compared to the alternative. High values of the statistic mean that the observed outcome was nearly as likely to occur under the null hypothesis as the alternative, and so the null hypothesis cannot be rejected.

===An example===
The following example is adapted and abridged from {{Harvtxt|Stuart|Ord|Arnold|1999|loc=§22.2}}.

Suppose that we have a random sample, of size {{mvar|n}}, from a population that is normally-distributed. Both the mean, {{mvar|&mu;}}, and the standard deviation, {{mvar|&sigma;}}, of the population are unknown. We want to test whether the mean is equal to a given value, {{math|''&mu;''{{sub|0}} }}.

Thus, our null hypothesis is {{math|''H''{{sub|0}}:&nbsp; ''&mu;'' {{=}} ''&mu;''{{sub|0}}&nbsp;}} and our alternative hypothesis is {{math|''H''{{sub|1}}:&nbsp; ''&mu;'' ≠ ''&mu;''{{sub|0}}&nbsp;}}. The likelihood function is 
:<math>\mathcal{L}(\mu,\sigma \mid x) = \left(2\pi\sigma^2\right)^{-n/2} \exp\left( -\sum_{i=1}^n \frac{(x_i -\mu)^2}{2\sigma^2}\right)\,.</math>

With some calculation (omitted here), it can then be shown that 
:<math>\lambda_{LR} = n \ln\left[ 1 + \frac{t^2}{n-1}\right] </math> 
where {{mvar|t}} is the [[t-statistic|{{mvar|t}}-statistic]] with {{math|''n''&thinsp;&minus;&thinsp;1}} degrees of freedom. Hence we may use the known exact distribution of {{math|''t''{{sub|''n''&minus;1}}}} to draw inferences.

==Asymptotic distribution: Wilks’ theorem==
{{Main|Wilks' theorem}}

If the distribution of the likelihood ratio corresponding to a particular null and alternative hypothesis can be explicitly determined then it can directly be used to form decision regions (to sustain or reject the null hypothesis). In most cases, however, the exact distribution of the likelihood ratio corresponding to specific hypotheses is very difficult to determine.{{Citation needed|date=September 2018}}

Assuming {{math|''H''<sub>0</sub>}} is true, there is a fundamental result by [[Samuel S. Wilks]]: As the sample size <math>n</math> approaches [[Infinity|<math>\infty</math>]], and if the null hypothesis lies strictly within the interior of the parameter space, the test statistic <math>\lambda_\text{LR}</math> defined above will be [[Asymptotic theory (statistics)|asymptotically]] [[chi-squared distribution|chi-squared distributed]] (<math>\chi^2</math>) with [[degrees of freedom (statistics)|degrees of freedom]] equal to the difference in dimensionality of <math>\Theta</math> and <math>\Theta_0</math>.<ref>{{cite journal |last=Wilks |first=S.S. |author-link=Samuel S. Wilks |doi=10.1214/aoms/1177732360 |title=The large-sample distribution of the likelihood ratio for testing composite hypotheses |journal=[[Annals of Mathematical Statistics]] |volume=9 |issue=1 |pages=60–62 |year=1938 |doi-access=free}}</ref> This implies that for a great variety of hypotheses, we can calculate the likelihood ratio <math>\lambda</math> for the data and then compare the observed <math>\lambda_\text{LR}</math> to the <math>\chi^2</math> value corresponding to a desired [[statistical significance]] as an ''approximate'' statistical test. Other extensions exist.{{which|date=March 2019}}

==See also==
*[[Akaike information criterion]]
*[[Bayes factor]]
*[[Johansen test]]
*[[Model selection]]
*[[Vuong's closeness test]]
*[[Sup-LR test]]
*[[Error exponents in hypothesis testing]]

==References==
{{Reflist}}

==Further reading==
* {{Citation | title= Likelihood ratios: A simple and flexible statistic for empirical psychologists | journal= [[Psychonomic Bulletin & Review]] | year= 2004 | author1-first= Scott | author1-last= Glover | author2-first= Peter | author2-last= Dixon | volume= 11 | issue= 5 | pages= 791–806 | doi= 10.3758/BF03196706| pmid= 15732688 | doi-access= free }}
* {{Citation|first1= Leonhard | last1= Held | first2= Daniel | last2= Sabanés Bové | title= Applied Statistical Inference—Likelihood and Bayes | year= 2014 | publisher= [[Springer Science+Business Media|Springer]]}}
* {{Citation | author-last= Kalbfleisch | author-first= J. G. | author-link= James G. Kalbfleisch | year= 1985 | title= Probability and Statistical Inference | volume= 2 | publisher= [[Springer-Verlag]]}}
* {{Citation | author1-first= Michael D. |author1-last= Perlman | author2-first= Lang | author2-last= Wu | title= The emperor's new tests | journal= [[Statistical Science]] | year= 1999 | volume= 14 |issue= 4 | pages= 355–381 |doi= 10.1214/ss/1009212517 | doi-access= free }}
* {{Citation | author-last= Perneger | author-first= Thomas V. | title= Sifting the evidence: Likelihood ratios are alternatives to P values | journal= [[The BMJ]] | year= 2001 | volume= 322 | issue= 7295 | pages= 1184–5 | pmc= 1120301 | pmid= 11379590 | doi=10.1136/bmj.322.7295.1184}}
* {{Citation | last1 = Pinheiro | first1 = José C. | last2 = Bates | first2 = Douglas M. | year = 2000 | title = Mixed-Effects Models in S and S-PLUS | publisher = [[Springer-Verlag]] | pages = 82–93 }}
* {{cite journal |title= Efficiency Testing of Prediction Markets: Martingale Approach, Likelihood Ratio and Bayes Factor Analysis | year=2021 |first1=Mark |last1=Richard |first2=Jan |last2=Vecer |journal=Risks |volume=9 |issue=2|page= 31 |doi= 10.3390/risks9020031 |doi-access= free |hdl=10419/258120 |hdl-access=free }}
* {{Citation | first= Daniel L. | last= Solomon | title= A note on the non-equivalence of the Neyman-Pearson and generalized likelihood ratio tests for testing a simple null versus a simple alternative hypothesis | journal= [[The American Statistician]] | year = 1975 | volume= 29 | issue= 2 | pages= 101–102 | doi= 10.1080/00031305.1975.10477383| url= https://ecommons.cornell.edu/bitstream/1813/32605/1/BU-510-M.pdf | hdl= 1813/32605 | hdl-access= free }}

==External links==
* [http://www.itl.nist.gov/div898/handbook/apr/section2/apr233.htm Practical application of likelihood ratio test described]
* [https://cran.r-project.org/web/packages/SPRT/SPRT.pdf R Package: Wald's Sequential Probability Ratio Test]
* [https://web.archive.org/web/20150504130014/http://faculty.vassar.edu/lowry/clin2.html Richard Lowry's Predictive Values and Likelihood Ratios] Online Clinical Calculator

{{Statistics|inference}}

{{DEFAULTSORT:Likelihood-Ratio Test}}
[[Category:Statistical ratios]]
[[Category:Statistical tests]]