Editing Prior probability (section)

== Improper priors ==
Let events <math>A_1, A_2, \ldots, A_n</math> be mutually exclusive and exhaustive. If Bayes' theorem is written as
<math display="block">P(A_i\mid B) = \frac{P(B \mid A_i) P(A_i)}{\sum_j P(B\mid A_j)P(A_j)}\, ,</math>
then it is clear that the same result would be obtained if all the prior probabilities ''P''(''A''<sub>''i''</sub>) and ''P''(''A''<sub>''j''</sub>) were multiplied by a given constant; the same would be true for a [[continuous random variable]].  If the summation in the denominator converges, the posterior probabilities will still sum (or integrate) to 1 even if the prior values do not, and so the priors may only need to be specified in the correct proportion. Taking this idea further, in many cases the sum or integral of the prior values may not even need to be finite to get sensible answers for the posterior probabilities.  When this is the case, the prior is called an '''improper prior'''.  However, the posterior distribution need not be a proper distribution if the prior is improper.<ref>{{cite journal |first1=A. P. |last1=Dawid |first2=M. |last2=Stone |first3=J. V. |last3=Zidek |title=Marginalization Paradoxes in Bayesian and Structural Inference |journal=Journal of the Royal Statistical Society |series=Series B (Methodological) |volume=35 |issue=2 |year=1973 |pages=189–233 |doi=10.1111/j.2517-6161.1973.tb00952.x |jstor=2984907 }}</ref> This is clear from the case where event ''B'' is independent of all of the ''A''<sub>''j''</sub>.

Statisticians sometimes use improper priors as [[uninformative prior]]s.<ref>{{cite book|last1=Christensen|first1=Ronald|last2=Johnson|first2=Wesley|last3=Branscum|first3=Adam|last4=Hanson|first4=Timothy E.|title=Bayesian Ideas and Data Analysis : An Introduction for Scientists and Statisticians|date=2010|publisher=CRC Press|location=Hoboken|isbn=9781439894798|page=69}}</ref>  For example, if they need a prior distribution for the mean and variance of a random variable, they may assume ''p''(''m'',&nbsp;''v'')&nbsp;~&nbsp;1/''v'' (for ''v''&nbsp;>&nbsp;0) which would suggest that any value for the mean is "equally likely" and that a value for the positive variance becomes "less likely" in inverse proportion to its value.  Many authors (Lindley, 1973; De Groot, 1937; Kass and Wasserman, 1996){{Citation needed|date=December 2008}} warn against the danger of over-interpreting those priors since they are not probability densities. The only relevance they have is found in the corresponding posterior, as long as it is well-defined for all observations. (The [[Beta distribution#Haldane.27s prior probability .28Beta.280.2C0.29.29|Haldane prior]] is a typical counterexample.{{Clarify|reason=counterexample of what?|date=May 2011}}{{Citation needed|date=May 2011}})

By contrast, [[likelihood function]]s do not need to be integrated, and a likelihood function that is uniformly 1 corresponds to the absence of data (all models are equally likely, given no data): Bayes' rule multiplies a prior by the likelihood, and an empty product is just the constant likelihood 1. However, without starting with a prior probability distribution, one does not end up getting a [[posterior probability]] distribution, and thus cannot integrate or compute expected values or loss. See {{slink|Likelihood function|Non-integrability}} for details.

=== Examples ===
Examples of improper priors include:
* The [[uniform distribution (continuous)|uniform distribution]] on an infinite interval (i.e., a half-line or the entire real line).
* Beta(0,0), the [[beta distribution]] for ''α''=0, ''β''=0 (uniform distribution on [[log-odds]] scale).
* The logarithmic prior on the [[positive reals]] (uniform distribution on [[log scale]]).{{Citation needed|date=October 2010}}

These functions, interpreted as uniform distributions, can also be interpreted as the [[likelihood function]] in the absence of data, but are not proper priors.