Editing Bayes factor (section)

==Definition==
The Bayes factor is the ratio of two marginal likelihoods; that is, the [[likelihood function|likelihoods]] of two statistical models integrated over the [[prior probability|prior probabilities]] of their parameters.<ref>{{cite book |first=Jeff |last=Gill |authorlink=Jeff Gill (academic) |title=Bayesian Methods : A Social and Behavioral Sciences Approach |location= |publisher=Chapman & Hall |year=2002 |isbn=1-58488-288-3 |chapter=Bayesian Hypothesis Testing and the Bayes Factor |pages=199–237 }}</ref>

The [[posterior probability]] <math>\Pr(M|D)</math> of a model ''M'' given data ''D'' is given by [[Bayes' theorem]]:

:<math>\Pr(M|D) = \frac{\Pr(D|M)\Pr(M)}{\Pr(D)}.</math>

The key data-dependent term <math>\Pr(D|M)</math> represents the probability that some data are produced under the assumption of the model ''M''; evaluating it correctly is the key to Bayesian model comparison.

Given a [[model selection]] problem in which one wishes to choose between two models on the basis of observed data ''D'', the plausibility of the two different models ''M''<sub>1</sub> and ''M''<sub>2</sub>, parametrised by model parameter vectors <math> \theta_1 </math> and <math> \theta_2 </math>, is assessed by the Bayes factor ''K'' given by

:<math> K = \frac{\Pr(D|M_1)}{\Pr(D|M_2)}
= \frac{\int \Pr(\theta_1|M_1)\Pr(D|\theta_1,M_1)\,d\theta_1}
{\int \Pr(\theta_2|M_2)\Pr(D|\theta_2,M_2)\,d\theta_2}
= \frac{\frac{\Pr(M_1|D)\Pr(D)}{\Pr(M_1)}}{\frac{\Pr(M_2|D)\Pr(D)}{\Pr(M_2)}}
= \frac{\Pr(M_1|D)}{\Pr(M_2|D)}\frac{\Pr(M_2)}{\Pr(M_1)}.
</math>

When the two models have equal prior probability, so that <math>\Pr(M_1) = \Pr(M_2)</math>, the Bayes factor is equal to the ratio of the posterior probabilities of ''M''<sub>1</sub> and ''M''<sub>2</sub>. If instead of the Bayes factor integral, the likelihood corresponding to the [[Maximum likelihood|maximum likelihood estimate]] of the parameter for each statistical model is used, then the test becomes a classical [[likelihood-ratio test]]. Unlike a likelihood-ratio test, this Bayesian model comparison does not depend on any single set of parameters, as it integrates over all parameters in each model (with respect to the respective priors).  An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure.<ref name=kassraftery1995>{{Cite journal |author1=Robert E. Kass  |author2=Adrian E. Raftery  |name-list-style=amp |year=1995|title=Bayes Factors|url=http://www.andrew.cmu.edu/user/kk3n/simplicity/KassRaftery1995.pdf|journal=Journal of the American Statistical Association|volume= 90 |number= 430|page= 791|doi=10.2307/2291091|jstor=2291091 }}</ref> It thus guards against [[overfitting]]. For models where an explicit version of the likelihood is not available or too costly to evaluate numerically, [[approximate Bayesian computation]] can be used for model selection in a Bayesian framework,<ref name= Toni2009b>{{cite journal |author1=Toni, T. |author2=Stumpf, M.P.H. |year = 2009 |title = Simulation-based model selection for dynamical systems in systems and population biology |journal =  Bioinformatics |volume = 26 |pages = 104–10 |doi = 10.1093/bioinformatics/btp619 |url= |pmid = 19880371 |issue = 1 |pmc = 2796821 |arxiv=0911.1705 }}</ref>
with the caveat that approximate-Bayesian estimates of Bayes factors are often biased.<ref name=Robert2011>{{cite journal |last1 = Robert |first1 = C.P. |author2=J. Cornuet |author3=J. Marin |author4=N.S. Pillai |name-list-style=amp | year = 2011 | title = Lack of confidence in approximate Bayesian computation model choice | journal = Proceedings of the National Academy of Sciences | volume = 108 | issue = 37 | pages = 15112–15117 | doi = 10.1073/pnas.1102900108 | pmid = 21876135 | pmc=3174657|bibcode=2011PNAS..10815112R |doi-access = free }}</ref>

Other approaches are:
* to treat model comparison as a [[Decision theory#Choice under uncertainty|decision problem]], computing the expected value or cost of each model choice;
* to use [[minimum message length]] (MML).
* to use [[minimum description length]] (MDL).