Editing Prior probability (section)

{{Short description|Distribution of an uncertain quantity}}
{{Bayesian statistics}}

A '''prior probability distribution''' of an uncertain quantity, simply called the '''prior''', is its assumed [[probability distribution]] before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a [[parameter]] of the model or a [[latent variable]] rather than an [[observable variable]].

In [[Bayesian statistics]], [[Bayes' rule]] prescribes how to update the prior with new information to obtain the [[posterior probability distribution]], which is the conditional distribution of the uncertain quantity given new data. Historically, the choice of priors was often constrained to a [[Conjugate prior|conjugate family]] of a given [[likelihood function]], so that it would result in a tractable posterior of the same family. The widespread availability of [[Markov chain Monte Carlo]] methods, however, has made this less of a concern.

There are many ways to construct a prior distribution.<ref>{{cite book |first=Christian |last=Robert |authorlink=Christian Robert |chapter=From Prior Information to Prior Distributions |pages=89–136 |title=The Bayesian Choice |location=New York |publisher=Springer |year=1994 |isbn=0-387-94296-3 }}</ref> In some cases, a prior may be determined from past information, such as previous experiments. A prior can also be ''elicited'' from the purely subjective assessment of an experienced expert.<ref>{{cite book |first=Kathryn |last=Chaloner |authorlink=Kathryn Chaloner |chapter=Elicitation of Prior Distributions |editor-first=Donald A. |editor-last=Berry |editor2-first=Dalene |editor2-last=Stangl |title=Bayesian Biostatistics |location=New York |publisher=Marcel Dekker |year=1996 |pages=141–156 |isbn=0-8247-9334-X }}</ref><ref>{{cite journal |first1=Petrus |last1=Mikkola |first2=Osvaldo A. |last2=Martin |first3=Suyog |last3=Chandramouli |first4=Marcelo |last4=Hartmann |first5=Oriol |last5=Abril Pla |first6=Owen |last6=Thomas |first7=Henri |last7=Pesonen |first8=Jukka |last8=Corander |first9=Aki |last9=Vehtari |first10=Samuel |last10=Kaski |first11=Paul-Christian |last11=Bürkner |first12=Arto |last12=Klami |display-authors=1 |title=Prior Knowledge Elicitation: The Past, Present, and Future |journal=Bayesian Analysis |date=2024 |issue=4 |volume=19 |doi=10.1214/23-BA1381 |s2cid=244798734 |hdl=11336/183197 |hdl-access=free }}</ref><ref>{{cite journal | last1 = Icazatti | first1 = Alejandro | last2 = Abril-Pla | first2 = Oriol | last3 = Klami | first3 = Arto | last4 = Martin | first4 = Osvaldo A. | title = PreliZ: A tool-box for prior elicitation | journal = Journal of Open Source Software | date = September 2023 | volume = 8 | issue = 89 | page = 5499 | doi = 10.21105/joss.05499| doi-access = free | bibcode = 2023JOSS....8.5499I }}
</ref> When no information is available, an '''uninformative prior''' may be adopted as justified by the [[principle of indifference]].<ref name="Zellner1971">{{cite book |last=Zellner |first=Arnold |authorlink=Arnold Zellner |chapter=Prior Distributions to Represent 'Knowing Little' |pages=41–53 |title=An Introduction to Bayesian Inference in Econometrics |location=New York |publisher=John Wiley & Sons |year=1971 |isbn=0-471-98165-6 }}</ref><ref>{{cite journal |first1=Harold J. |last1=Price |first2=Allison R. |last2=Manson |title=Uninformative priors for Bayes' theorem |journal=AIP Conf. Proc. |volume=617 |year=2001 |pages=379–391 |doi=10.1063/1.1477060 }}</ref> In modern applications, priors are also often chosen for their mechanical properties, such as [[Bayesian interpretation of kernel regularization|regularization]] and [[feature selection]].<ref>{{cite journal |first1=Juho |last1=Piironen |first2=Aki |last2=Vehtari |title=Sparsity information and regularization in the horseshoe and other shrinkage priors |journal=Electronic Journal of Statistics |volume=11 |issue=2 |pages=5018–5051 |year=2017 |doi=10.1214/17-EJS1337SI |doi-access=free |arxiv=1707.01694 }}</ref><ref>{{cite journal |first1=Daniel |last1=Simpson |first2=Håvard |last2=Rue |first3=Andrea |last3=Riebler |first4=Thiago G. |last4=Martins |first5=Sigrunn H. |last5=Sørbye |display-authors=1 |title=Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors |journal=Statistical Science |volume=32 |issue=1 |pages=1–28 |year=2017 |doi=10.1214/16-STS576 |s2cid=88513041 |arxiv=1403.4630 }}</ref><ref>{{cite journal |first=Vincent |last=Fortuin |title=Priors in Bayesian Deep Learning: A Review |journal=International Statistical Review |volume=90 |issue=3 |year=2022 |pages=563–591 |doi=10.1111/insr.12502 |hdl=20.500.11850/547969 |s2cid=234681651 |hdl-access=free }}</ref>

The prior distributions of model parameters will often depend on parameters of their own. Uncertainty about these [[Hyperparameter (Bayesian statistics)|hyperparameter]]s can, in turn, be expressed as [[hyperprior]] probability distributions. For example, if one uses a [[beta distribution]] to model the distribution of the parameter ''p'' of a [[Bernoulli distribution]], then:
* ''p'' is a parameter of the underlying system (Bernoulli distribution), and
* ''α'' and ''β'' are parameters of the prior distribution (beta distribution); hence ''hyper''parameters.
In principle, priors can be decomposed into many conditional levels of distributions, so-called ''hierarchical priors''.<ref>{{cite book |last=Congdon |first=Peter D. |chapter=Regression Techniques using Hierarchical Priors |pages=253–315 |title=Bayesian Hierarchical Models |location=Boca Raton |publisher=CRC Press |edition=2nd |year=2020|isbn=978-1-03-217715-1 }}</ref>