Editing Prediction interval (section)

== Normal distribution ==
Given a sample from a [[normal distribution]], whose parameters are unknown, it is possible to give prediction intervals in the frequentist sense, i.e., an interval [''a'',&nbsp;''b''] based on statistics of the sample such that on repeated experiments, ''X''<sub>''n''+1</sub> falls in the interval the desired percentage of the time; one may call these "predictive [[confidence interval]]s".<ref>{{harvtxt|Geisser|1993|p= [https://books.google.com/books?id=wfdlBZ_iwZoC 6]}}: Chapter 2: Non-Bayesian predictive approaches</ref>

A general technique of frequentist prediction intervals is to find and compute a [[pivotal quantity]] of the observables ''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>,&nbsp;''X''<sub>''n''+1</sub> – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation ''X''<sub>''n''+1</sub> falling in some interval computed in terms of the observed values so far, <math>X_1,\dots,X_n.</math> Such a pivotal quantity, depending only on observables, is called an [[ancillary statistic]].<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7]}}</ref> The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out.
The most familiar pivotal quantity is the [[Student's t-statistic]], which can be derived by this method and is used in the sequel.

=== Known mean, known variance ===
{{see also|68–95–99.7 rule}}

A prediction interval [''ℓ'',''u''] for a future observation ''X'' in a normal distribution ''N''(''μ'',''σ''<sup>2</sup>) with known [[mean]] and [[variance]] may be calculated from

:<math>\gamma=P(\ell<X<u)=P\left(\frac{\ell-\mu} \sigma < \frac{X-\mu} \sigma < \frac{u-\mu} \sigma \right)=P\left(\frac{\ell-\mu} \sigma < Z < \frac{u-\mu} \sigma \right),</math>

where <math>Z=\frac{X-\mu}{\sigma}</math>, the [[standard score]] of ''X'', is distributed as  standard normal.

Hence

:<math>\frac{\ell-\mu} \sigma = -z, \quad \frac{u-\mu} \sigma = z,</math>

or

:<math>\ell=\mu-z\sigma, \quad u=\mu+z\sigma,</math>

with ''z'' the [[quantile]] in the standard normal distribution for which:

:<math>\gamma=P(-z<Z<z).</math>
or equivalently;

:<math>\tfrac 12(1-\gamma)=P(Z>z).</math>

{|class="wikitable" align="left" style="margin-right:1em;"
! Prediction<br> interval !! z
|-
| 75% || 1.15<ref name=MedicalStatisticsA2>Table A2 in {{Harvtxt|Sterne|Kirkwood|2003|p=472}}</ref>
|-
| 90% || 1.64<ref name=MedicalStatisticsA2/>
|-
| 95% || 1.96<ref name=MedicalStatisticsA2/>
|-
| 99% || 2.58<ref name=MedicalStatisticsA2/>
|}
[[File:Standard score and prediction interval.svg|thumb|250px|right|Prediction interval (on the [[y-axis]]) given from z (the quantile of the [[standard score]], on the [[x-axis]]). The y-axis is logarithmically compressed (but the values on it are not modified).]]

The prediction interval is conventionally written  as:
:<math>\left[\mu- z\sigma,\  \mu + z\sigma \right]. </math>

For example, to calculate the 95% prediction interval for a normal distribution with a mean (''μ'') of 5 and a standard deviation (''σ'') of 1, then ''z'' is approximately 2. Therefore, the lower limit of the prediction interval is approximately 5&nbsp;‒&nbsp;(2&sdot;1) = 3, and the upper limit is approximately 5&nbsp;+&nbsp;(2&sdot;1) = 7, thus giving a prediction interval of approximately 3 to 7.

[[File:Cumulative distribution function for normal distribution, mean 0 and sd 1.svg|270px|thumb|right|Diagram showing the [[cumulative distribution function]] for the normal distribution with mean (''μ'') 0 and variance (''σ''<sup>2</sup>)&nbsp;1. In addition to the [[quantile function]], the prediction interval for any standard score can be calculated by (1&nbsp;&minus;&nbsp;(1&nbsp;&minus;&nbsp;<span style="font-size:100%;">Φ</span><sub>''μ'',''σ''<sup>2</sup></sub>(standard score))&sdot;2). For example, a standard score of ''x''&nbsp;=&nbsp;1.96 gives <span style="font-size:100%;">Φ</span><sub>''μ'',''σ''<sup>2</sup></sub>(1.96)&nbsp;=&nbsp;0.9750 corresponding to a prediction interval of (1&nbsp;&minus;&nbsp;(1&nbsp;&minus;&nbsp;0.9750)&sdot;2) =&nbsp;0.9500&nbsp;=&nbsp;95%.]]

=== Estimation of parameters ===
For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean <math>\overline{X}</math> as estimate for ''μ'' and the [[sample variance]] ''s''<sup>2</sup> as an estimate for ''σ''<sup>2</sup>. There are two natural choices for ''s''<sup>2</sup> here – dividing by <math>(n-1)</math> yields an unbiased estimate, while dividing by ''n'' yields the [[maximum likelihood estimator]], and either might be used. One then uses the quantile function with these estimated parameters <math>\Phi^{-1}_{\overline{X},s^2}</math> to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation<ref>{{Harvtxt|Geisser|1993|pp=[https://books.google.com/books?id=wfdlBZ_iwZoC 8–9]}}</ref> – it is not a predictive confidence interval.

For the sequel, use the sample mean:
:<math>\overline{X} = (X_1+\cdots+X_n)/n</math>
and the (unbiased) sample variance:
:<math>s^2 = {1 \over n-1}\sum_{i=1}^n (X_i-\overline{X})^2</math>

==== Unknown mean, known variance ====
Given<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7–]}}</ref> a normal distribution with unknown mean ''μ'' but known variance <math>\sigma^2</math>, the sample mean <math>\overline{X}</math> of the observations <math>X_1,\dots,X_n</math> has distribution <math>N(\mu,\sigma^2/n),</math> while the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math> Taking the difference of these cancels the ''μ'' and yields a normal distribution of variance <math>\sigma^2+(\sigma^2/n),</math> thus
:<math>\frac{X_{n+1}-\overline{X}}{\sqrt{\sigma^2+(\sigma^2/n)}} \sim N(0,1).</math>
Solving for <math>X_{n+1}</math> gives the prediction distribution <math>N(\overline{X},\sigma^2+(\sigma^2/n)),</math> from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100''p''%, then on repeated applications of this computation, the future observation <math>X_{n+1}</math> will fall in the predicted interval 100''p''% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean <math>\overline{X}</math> and known variance <math>\sigma^2</math>, as this uses compound variance <math>\sigma^2+(\sigma^2/n)</math>, hence yields slightly wider intervals. This is necessary for the desired confidence interval property to hold.

==== Known mean, unknown variance ====
Conversely, given a normal distribution with known mean ''μ'' but unknown variance <math>\sigma^2</math>, 
the sample variance <math>s^2</math> of the observations <math>X_1,\dots,X_n</math> has, up to scale, a [[chi-squared distribution|<math>\chi_{n-1}^2</math> distribution]]; more precisely:
:<math>\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2.</math>
On the other hand, the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math>
Taking the ratio of the future observation residual <math>X_{n+1}-\mu</math> and the sample standard deviation ''s'' cancels the ''σ,'' yielding a [[Student's t-distribution]] with ''n''&nbsp;–&nbsp;1 [[degrees of freedom (statistics)|degrees of freedom]] (see its [[Student%27s_t-distribution#Derivation|derivation]]):

: <math>\frac{X_{n+1}-\mu} s \sim T_{n-1}.</math>

Solving for <math>X_{n+1}</math> gives the prediction distribution <math>\mu \pm sT_{n-1},</math> from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation <math>s</math> and known mean ''μ'', as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

==== Unknown mean, unknown variance ====
Combining the above for a normal distribution <math>N(\mu,\sigma^2)</math> with both ''μ'' and ''σ''<sup>2</sup> unknown yields the following ancillary statistic:<ref>{{Harvtxt|Geisser|1993|loc=[https://books.google.com/books?id=wfdlBZ_iwZoC Example 2.2, p. 9–10]}}</ref>
:<math>\frac{X_{n+1}-\overline{X}}{s\sqrt{1+1/n}} \sim T_{n-1}</math>
This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for <math>X_{n+1}</math> yields the prediction distribution 
:<math>\overline{X} + s\sqrt{1+1/n} \cdot T_{n-1}</math>

The probability of <math>X_{n+1}</math> falling in a given interval is then:
:<math>\Pr\left(\overline{X}-T_{n-1,a} s\sqrt{1+(1/n)}\leq X_{n+1}   \leq\overline{X}+T_{n-1,a} s\sqrt{1+(1/n)}\,\right)=p</math>

where ''T<sub>n-1,a</sub>'' is the 100((1&nbsp;−&nbsp;''p'')/2)<sup>th</sup> [[percentile]] of [[Student's t-distribution]] with ''n''&nbsp;&minus;&nbsp;1 degrees of freedom.  Therefore, the numbers

:<math>\overline{X} \pm T_{n-1,a} s \sqrt{1+(1/n)}</math>

are the endpoints of a 100(1&nbsp;−&nbsp;''p'')% prediction interval for <math>X_{n+1}</math>.