Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Likelihood function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Background and interpretation== ===Historical remarks=== {{see also|History of statistics|History of probability}} The term "likelihood" has been in use in English since at least late [[Middle English]].<ref>"likelihood", ''[[Shorter Oxford English Dictionary]]'' (2007).</ref> Its formal use to refer to a specific [[Function (mathematics)|function]] in mathematical statistics was proposed by [[Ronald Fisher]],<ref>{{Citation | title=On the history of maximum likelihood in relation to inverse probability and least squares| first= A. | last=Hald |author-link=Anders Hald |journal=[[Statistical Science]] |volume= 14| issue=2 |year=1999 | pages =214–222 | doi=10.1214/ss/1009212248 | jstor = 2676741|url=http://projecteuclid.org/download/pdf_1/euclid.ss/1009212248 |mode=cs1 | doi-access=free }}</ref> in two research papers published in 1921<ref>{{citation | last=Fisher | first=R.A. |author-link=Ronald Fisher | journal= Metron | title= On the "probable error" of a coefficient of correlation deduced from a small sample | volume=1 | year=1921 | pages=3–32 |mode=cs1 }}</ref> and 1922.<ref name=Fisher1922>{{citation | last=Fisher | first=R.A. |author-link=Ronald Fisher | journal= Philosophical Transactions of the Royal Society A | title=On the mathematical foundations of theoretical statistics | volume=222 | issue=594–604 | year=1922 | pages=309–368 | url=http://digital.library.adelaide.edu.au/dspace/handle/2440/15172 | jstor=91208 | jfm = 48.1280.02 |doi=10.1098/rsta.1922.0009 | bibcode=1922RSPTA.222..309F |mode=cs1 | doi-access=free | hdl=2440/15172 | hdl-access=free }}</ref> The 1921 paper introduced what is today called a "likelihood interval"; the 1922 paper introduced the term "[[method of maximum likelihood]]". Quoting Fisher: {{Cquote|[I]n 1922, I proposed the term 'likelihood,' in view of the fact that, with respect to [the parameter], it is not a probability, and does not obey the laws of probability, while at the same time it bears to the problem of rational choice among the possible values of [the parameter] a relation similar to that which probability bears to the problem of predicting events in games of chance. . . . Whereas, however, in relation to psychological judgment, likelihood has some resemblance to probability, the two concepts are wholly distinct. . . ."<ref>{{citation |last=Klemens |first=Ben |title=Modeling with Data: Tools and Techniques for Scientific Computing |publisher= [[Princeton University Press]] |year=2008 |page=329 |mode=cs1 }}</ref>}} The concept of likelihood should not be confused with probability as mentioned by Sir Ronald Fisher {{Cquote|I stress this because in spite of the emphasis that I have always laid upon the difference between probability and likelihood there is still a tendency to treat likelihood as though it were a sort of probability. The first result is thus that there are two different measures of rational belief appropriate to different cases. Knowing the population we can express our incomplete knowledge of, or expectation of, the sample in terms of probability; knowing the sample we can express our incomplete knowledge of the population in terms of likelihood.<ref>{{citation | last = Fisher | first = Ronald | authorlink=Ronald Fisher | title = Inverse Probability | year = 1930 | journal = [[Mathematical Proceedings of the Cambridge Philosophical Society]] | volume = 26 | issue = 4 | pages= 528–535 | doi = 10.1017/S0305004100016297 | bibcode = 1930PCPS...26..528F |mode=cs1 }}</ref>}} Fisher's invention of statistical likelihood was in reaction against an earlier form of reasoning called [[inverse probability]].<ref>{{citation | last1 = Fienberg | first1 = Stephen E | year = 1997 | title = Introduction to R.A. Fisher on inverse probability and likelihood | journal = [[Statistical Science]] | volume = 12 | issue = 3| page = 161 | doi = 10.1214/ss/1030037905 |mode=cs1 | doi-access = free }}</ref> His use of the term "likelihood" fixed the meaning of the term within mathematical statistics. [[A. W. F. Edwards]] (1972) established the axiomatic basis for use of the log-likelihood ratio as a measure of relative support for one hypothesis against another. The ''support function'' is then the natural logarithm of the likelihood function. Both terms are used in [[phylogenetics]], but were not adopted in a general treatment of the topic of statistical evidence.<ref>{{citation |last=Royall |first=R. |year=1997 |title=Statistical Evidence |publisher=[[Chapman & Hall]] |mode=cs1 }}</ref> ===Interpretations under different foundations=== Among statisticians, there is no consensus about what the [[Foundations of statistics|foundation of statistics]] should be. There are four main paradigms that have been proposed for the foundation: [[frequentism]], [[Bayesianism]], [[likelihoodism]], and [[Akaike information criterion|AIC-based]].<ref name="BF11">{{Citation |editor1-last= Bandyopadhyay |editor1-first= P. S. |editor-first2= M. R. |editor-last2= Forster | title = Philosophy of Statistics | publisher= [[North-Holland Publishing]] | year = 2011 |mode=cs1 }}</ref> For each of the proposed foundations, the interpretation of likelihood is different. The four interpretations are described in the subsections below. ====Frequentist interpretation==== {{empty section|date=March 2019}} ====Bayesian interpretation==== In [[Bayesian inference]], although one can speak about the likelihood of any proposition or [[random variable]] given another random variable: for example the likelihood of a parameter value or of a [[statistical model]] (see [[marginal likelihood]]), given specified data or other evidence,<ref name='good1950'>I. J. Good: ''Probability and the Weighing of Evidence'' (Griffin 1950), §6.1</ref><ref name='jeffreys1983'>H. Jeffreys: ''Theory of Probability'' (3rd ed., Oxford University Press 1983), §1.22</ref><ref name='jaynes2003'>E. T. Jaynes: ''Probability Theory: The Logic of Science'' (Cambridge University Press 2003), §4.1</ref><ref name='lindley1980'>D. V. Lindley: ''Introduction to Probability and Statistics from a Bayesian Viewpoint. Part 1: Probability'' (Cambridge University Press 1980), §1.6</ref> the likelihood function remains the same entity, with the additional interpretations of (i) a [[Conditional probability distribution|conditional density]] of the data given the parameter (since the parameter is then a random variable) and (ii) a measure or amount of information brought by the data about the parameter value or even the model.<ref name='good1950'/><ref name='jeffreys1983'/><ref name='jaynes2003'/><ref name='lindley1980'/><ref name='gelmanetal2014'>A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, D. B. Rubin: ''Bayesian Data Analysis'' (3rd ed., Chapman & Hall/CRC 2014), §1.3</ref> Due to the introduction of a probability structure on the parameter space or on the collection of models, it is possible that a parameter value or a statistical model have a large likelihood value for given data, and yet have a low ''probability'', or vice versa.<ref name='jaynes2003'/><ref name='gelmanetal2014'/> This is often the case in medical contexts.<ref>{{citation |first1=H. C. |last1=Sox |first2=M. C. |last2=Higgins |first3=D. K. |last3=Owens |title=Medical Decision Making |edition=2nd |publisher=Wiley |year=2013 |doi=10.1002/9781118341544 |isbn=9781118341544 |at=chapters 3–4 }}</ref> Following [[Bayes' Rule]], the likelihood when seen as a conditional density can be multiplied by the [[prior probability]] density of the parameter and then normalized, to give a [[posterior probability]] density.<ref name='good1950'/><ref name='jeffreys1983'/><ref name='jaynes2003'/><ref name='lindley1980'/><ref name="gelmanetal2014"/> More generally, the likelihood of an unknown quantity <math display="inline">X</math> given another unknown quantity <math display="inline">Y</math> is proportional to the ''probability of <math display="inline">Y</math> given <math display="inline">X</math>''.<ref name='good1950'/><ref name='jeffreys1983'/><ref name='jaynes2003'/><ref name='lindley1980'/><ref name='gelmanetal2014'/> ====Likelihoodist interpretation==== {{more footnotes needed|date=April 2019}} In frequentist statistics, the likelihood function is itself a [[statistic]] that summarizes a single sample from a population, whose calculated value depends on a choice of several parameters ''θ''<sub>1</sub> ... ''θ''<sub>p</sub>, where ''p'' is the count of parameters in some already-selected [[statistical model]]. The value of the likelihood serves as a figure of merit for the choice used for the parameters, and the parameter set with maximum likelihood is the best choice, given the data available. The specific calculation of the likelihood is the probability that the observed sample would be assigned, assuming that the model chosen and the values of the several parameters '''''θ''''' give an accurate approximation of the [[frequency distribution]] of the population that the observed sample was drawn from. Heuristically, it makes sense that a good choice of parameters is those which render the sample actually observed the maximum possible ''post-hoc'' probability of having happened. [[Wilks' theorem]] quantifies the heuristic rule by showing that the difference in the logarithm of the likelihood generated by the estimate's parameter values and the logarithm of the likelihood generated by population's "true" (but unknown) parameter values is asymptotically [[chi-squared distribution|χ<sup>2</sup> distributed]]. Each independent sample's maximum likelihood estimate is a separate estimate of the "true" parameter set describing the population sampled. Successive estimates from many independent samples will cluster together with the population's "true" set of parameter values hidden somewhere in their midst. The difference in the logarithms of the maximum likelihood and adjacent parameter sets' likelihoods may be used to draw a [[confidence region]] on a plot whose co-ordinates are the parameters ''θ''<sub>1</sub> ... ''θ''<sub>p</sub>. The region surrounds the maximum-likelihood estimate, and all points (parameter sets) within that region differ at most in log-likelihood by some fixed value. The [[chi-squared distribution|χ<sup>2</sup> distribution]] given by [[Wilks' theorem]] converts the region's log-likelihood differences into the "confidence" that the population's "true" parameter set lies inside. The art of choosing the fixed log-likelihood difference is to make the confidence acceptably high while keeping the region acceptably small (narrow range of estimates). As more data are observed, instead of being used to make independent estimates, they can be combined with the previous samples to make a single combined sample, and that large sample may be used for a new maximum likelihood estimate. As the size of the combined sample increases, the size of the likelihood region with the same confidence shrinks. Eventually, either the size of the confidence region is very nearly a single point, or the entire population has been sampled; in both cases, the estimated parameter set is essentially the same as the population parameter set. ====AIC-based interpretation==== {{expand section|date=March 2019}} Under the [[Akaike information criterion|AIC]] paradigm, likelihood is interpreted within the context of [[information theory]].<ref>{{Citation | first=H. |last=Akaike |author-link=Hirotugu Akaike | contribution = Prediction and entropy | pages=1–24 | title= A Celebration of Statistics | editor1-first= A. C. | editor1-last= Atkinson | editor2-first= S. E. | editor2-last= Fienberg | editor2-link= Stephen Fienberg | year = 1985 | publisher= Springer |mode=cs1 }}</ref><ref>{{Citation | author1-first= Y. | author1-last= Sakamoto | author2-first= M. | author2-last= Ishiguro | author3-first= G. | author3-last= Kitagawa | title= Akaike Information Criterion Statistics | year= 1986 | publisher= [[D. Reidel]] | at= Part I |mode=cs1 }}</ref><ref>{{Citation |last1=Burnham |first1=K. P. |last2=Anderson |first2=D. R. |year=2002 |title=Model Selection and Multimodel Inference: A practical information-theoretic approach |edition=2nd |publisher= [[Springer-Verlag]] | at= chap. 7 |mode=cs1 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)