Editing Prediction interval (section)

=== Estimation of parameters ===
For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean <math>\overline{X}</math> as estimate for ''μ'' and the [[sample variance]] ''s''<sup>2</sup> as an estimate for ''σ''<sup>2</sup>. There are two natural choices for ''s''<sup>2</sup> here – dividing by <math>(n-1)</math> yields an unbiased estimate, while dividing by ''n'' yields the [[maximum likelihood estimator]], and either might be used. One then uses the quantile function with these estimated parameters <math>\Phi^{-1}_{\overline{X},s^2}</math> to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation<ref>{{Harvtxt|Geisser|1993|pp=[https://books.google.com/books?id=wfdlBZ_iwZoC 8–9]}}</ref> – it is not a predictive confidence interval.

For the sequel, use the sample mean:
:<math>\overline{X} = (X_1+\cdots+X_n)/n</math>
and the (unbiased) sample variance:
:<math>s^2 = {1 \over n-1}\sum_{i=1}^n (X_i-\overline{X})^2</math>

==== Unknown mean, known variance ====
Given<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7–]}}</ref> a normal distribution with unknown mean ''μ'' but known variance <math>\sigma^2</math>, the sample mean <math>\overline{X}</math> of the observations <math>X_1,\dots,X_n</math> has distribution <math>N(\mu,\sigma^2/n),</math> while the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math> Taking the difference of these cancels the ''μ'' and yields a normal distribution of variance <math>\sigma^2+(\sigma^2/n),</math> thus
:<math>\frac{X_{n+1}-\overline{X}}{\sqrt{\sigma^2+(\sigma^2/n)}} \sim N(0,1).</math>
Solving for <math>X_{n+1}</math> gives the prediction distribution <math>N(\overline{X},\sigma^2+(\sigma^2/n)),</math> from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100''p''%, then on repeated applications of this computation, the future observation <math>X_{n+1}</math> will fall in the predicted interval 100''p''% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean <math>\overline{X}</math> and known variance <math>\sigma^2</math>, as this uses compound variance <math>\sigma^2+(\sigma^2/n)</math>, hence yields slightly wider intervals. This is necessary for the desired confidence interval property to hold.

==== Known mean, unknown variance ====
Conversely, given a normal distribution with known mean ''μ'' but unknown variance <math>\sigma^2</math>, 
the sample variance <math>s^2</math> of the observations <math>X_1,\dots,X_n</math> has, up to scale, a [[chi-squared distribution|<math>\chi_{n-1}^2</math> distribution]]; more precisely:
:<math>\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2.</math>
On the other hand, the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math>
Taking the ratio of the future observation residual <math>X_{n+1}-\mu</math> and the sample standard deviation ''s'' cancels the ''σ,'' yielding a [[Student's t-distribution]] with ''n''&nbsp;–&nbsp;1 [[degrees of freedom (statistics)|degrees of freedom]] (see its [[Student%27s_t-distribution#Derivation|derivation]]):

: <math>\frac{X_{n+1}-\mu} s \sim T_{n-1}.</math>

Solving for <math>X_{n+1}</math> gives the prediction distribution <math>\mu \pm sT_{n-1},</math> from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation <math>s</math> and known mean ''μ'', as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

==== Unknown mean, unknown variance ====
Combining the above for a normal distribution <math>N(\mu,\sigma^2)</math> with both ''μ'' and ''σ''<sup>2</sup> unknown yields the following ancillary statistic:<ref>{{Harvtxt|Geisser|1993|loc=[https://books.google.com/books?id=wfdlBZ_iwZoC Example 2.2, p. 9–10]}}</ref>
:<math>\frac{X_{n+1}-\overline{X}}{s\sqrt{1+1/n}} \sim T_{n-1}</math>
This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for <math>X_{n+1}</math> yields the prediction distribution 
:<math>\overline{X} + s\sqrt{1+1/n} \cdot T_{n-1}</math>

The probability of <math>X_{n+1}</math> falling in a given interval is then:
:<math>\Pr\left(\overline{X}-T_{n-1,a} s\sqrt{1+(1/n)}\leq X_{n+1}   \leq\overline{X}+T_{n-1,a} s\sqrt{1+(1/n)}\,\right)=p</math>

where ''T<sub>n-1,a</sub>'' is the 100((1&nbsp;−&nbsp;''p'')/2)<sup>th</sup> [[percentile]] of [[Student's t-distribution]] with ''n''&nbsp;&minus;&nbsp;1 degrees of freedom.  Therefore, the numbers

:<math>\overline{X} \pm T_{n-1,a} s \sqrt{1+(1/n)}</math>

are the endpoints of a 100(1&nbsp;−&nbsp;''p'')% prediction interval for <math>X_{n+1}</math>.