Editing Bayesian inference (section)

===Bayesian prediction===

*The [[posterior predictive distribution]] is the distribution of a new data point, marginalized over the posterior: <math display="block">p(\tilde{x} \mid \mathbf{X},\alpha) = \int p(\tilde{x} \mid \theta) p(\theta \mid \mathbf{X},\alpha) d\theta</math>
*The [[prior predictive distribution]] is the distribution of a new data point, marginalized over the prior: <math display="block">p(\tilde{x} \mid \alpha) = \int p(\tilde{x} \mid \theta) p(\theta \mid \alpha) d\theta</math>

Bayesian theory calls for the use of the posterior predictive distribution to do [[predictive inference]], i.e., to [[prediction|predict]] the distribution of a new, unobserved data point. That is, instead of a fixed point as a prediction, a distribution over possible points is returned.  Only this way is the entire posterior distribution of the parameter(s) used.  By comparison, prediction in [[frequentist statistics]] often involves finding an optimum point estimate of the parameter(s)—e.g., by [[maximum likelihood]] or [[maximum a posteriori estimation]] (MAP)—and then plugging this estimate into the formula for the distribution of a data point. This has the disadvantage that it does not account for any uncertainty in the value of the parameter, and hence will underestimate the [[variance]] of the predictive distribution.

In some instances, frequentist statistics can work around this problem. For example, [[confidence interval]]s and [[prediction interval]]s in frequentist statistics when constructed from a [[normal distribution]] with unknown [[mean]] and [[variance]] are constructed using a [[Student's t-distribution]].  This correctly estimates the variance, due to the facts that (1)&nbsp;the average of normally distributed random variables is also normally distributed, and (2) the predictive distribution of a normally distributed data point with unknown mean and variance, using conjugate or uninformative priors, has a Student's t-distribution. In Bayesian statistics, however, the posterior predictive distribution can always be determined exactly—or at least to an arbitrary level of precision when numerical methods are used.

Both types of predictive distributions have the form of a [[compound probability distribution]] (as does the [[marginal likelihood]]). In fact, if the prior distribution is a [[conjugate prior]], such that the prior and posterior distributions come from the same family, it can be seen that both prior and posterior predictive distributions also come from the same family of compound distributions. The only difference is that the posterior predictive distribution uses the updated values of the hyperparameters (applying the Bayesian update rules given in the [[conjugate prior]] article), while the prior predictive distribution uses the values of the hyperparameters that appear in the prior distribution.