Editing Prediction interval

{{Short description|Estimate of an interval in which future observations will fall}}
{{distinguish|Prediction error}}
{{Multiple issues|
{{Original research|date=May 2021}}
{{More footnotes needed|date=May 2021}}
}}

In [[statistical inference]], specifically [[predictive inference]], a '''prediction interval''' is an estimate of an [[interval (statistics)|interval]] in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in [[regression analysis]].

A simple example is given by a six-sided die with face values ranging from 1 to 6. The confidence interval for the estimated expected value of the face value will be around 3.5 and will become narrower with a larger sample size. However, the prediction interval for the next roll will approximately range from 1 to 6, even with any number of samples seen so far.

Prediction intervals are used in both [[frequentist statistics]] and [[Bayesian statistics]]: a prediction interval bears the same relationship to a future observation that a frequentist [[confidence interval]] or Bayesian [[credible interval]] bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed.

== Introduction ==
If one makes the [[parametric statistics|parametric assumption]] that the underlying distribution is a [[normal distribution]], and has a sample set {''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>}, then confidence intervals and credible intervals may be used to estimate the [[population mean]] ''μ'' and [[population standard deviation]] ''σ'' of the underlying population, while prediction intervals may be used to estimate the value of the next sample variable, ''X''<sub>''n''+1</sub>.

Alternatively, in [[#Bayesian statistics|Bayesian terms]], a prediction interval can be described as a credible interval for the variable itself, rather than for a parameter of the distribution thereof.

The concept of prediction intervals need not be restricted to inference about a single future sample value but can be extended to more complicated cases. For example, in the context of river flooding where analyses are often based on annual values of the largest flow within the year, there may be interest in making inferences about the largest flood likely to be experienced within the next 50 years.

Since prediction intervals are only concerned with past and future observations, rather than unobservable population parameters, they are advocated as a better method than confidence intervals by some statisticians, such as [[Seymour Geisser]],{{Citation needed|date=August 2009}} following the focus on observables by [[Bruno de Finetti]].{{Citation needed|date=August 2009}}

== Normal distribution ==
Given a sample from a [[normal distribution]], whose parameters are unknown, it is possible to give prediction intervals in the frequentist sense, i.e., an interval [''a'',&nbsp;''b''] based on statistics of the sample such that on repeated experiments, ''X''<sub>''n''+1</sub> falls in the interval the desired percentage of the time; one may call these "predictive [[confidence interval]]s".<ref>{{harvtxt|Geisser|1993|p= [https://books.google.com/books?id=wfdlBZ_iwZoC 6]}}: Chapter 2: Non-Bayesian predictive approaches</ref>

A general technique of frequentist prediction intervals is to find and compute a [[pivotal quantity]] of the observables ''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>,&nbsp;''X''<sub>''n''+1</sub> – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation ''X''<sub>''n''+1</sub> falling in some interval computed in terms of the observed values so far, <math>X_1,\dots,X_n.</math> Such a pivotal quantity, depending only on observables, is called an [[ancillary statistic]].<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7]}}</ref> The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out.
The most familiar pivotal quantity is the [[Student's t-statistic]], which can be derived by this method and is used in the sequel.

=== Known mean, known variance ===
{{see also|68–95–99.7 rule}}

A prediction interval [''ℓ'',''u''] for a future observation ''X'' in a normal distribution ''N''(''μ'',''σ''<sup>2</sup>) with known [[mean]] and [[variance]] may be calculated from

:<math>\gamma=P(\ell<X<u)=P\left(\frac{\ell-\mu} \sigma < \frac{X-\mu} \sigma < \frac{u-\mu} \sigma \right)=P\left(\frac{\ell-\mu} \sigma < Z < \frac{u-\mu} \sigma \right),</math>

where <math>Z=\frac{X-\mu}{\sigma}</math>, the [[standard score]] of ''X'', is distributed as  standard normal.

Hence

:<math>\frac{\ell-\mu} \sigma = -z, \quad \frac{u-\mu} \sigma = z,</math>

or

:<math>\ell=\mu-z\sigma, \quad u=\mu+z\sigma,</math>

with ''z'' the [[quantile]] in the standard normal distribution for which:

:<math>\gamma=P(-z<Z<z).</math>
or equivalently;

:<math>\tfrac 12(1-\gamma)=P(Z>z).</math>

{|class="wikitable" align="left" style="margin-right:1em;"
! Prediction<br> interval !! z
|-
| 75% || 1.15<ref name=MedicalStatisticsA2>Table A2 in {{Harvtxt|Sterne|Kirkwood|2003|p=472}}</ref>
|-
| 90% || 1.64<ref name=MedicalStatisticsA2/>
|-
| 95% || 1.96<ref name=MedicalStatisticsA2/>
|-
| 99% || 2.58<ref name=MedicalStatisticsA2/>
|}
[[File:Standard score and prediction interval.svg|thumb|250px|right|Prediction interval (on the [[y-axis]]) given from z (the quantile of the [[standard score]], on the [[x-axis]]). The y-axis is logarithmically compressed (but the values on it are not modified).]]

The prediction interval is conventionally written  as:
:<math>\left[\mu- z\sigma,\  \mu + z\sigma \right]. </math>

For example, to calculate the 95% prediction interval for a normal distribution with a mean (''μ'') of 5 and a standard deviation (''σ'') of 1, then ''z'' is approximately 2. Therefore, the lower limit of the prediction interval is approximately 5&nbsp;‒&nbsp;(2&sdot;1) = 3, and the upper limit is approximately 5&nbsp;+&nbsp;(2&sdot;1) = 7, thus giving a prediction interval of approximately 3 to 7.

[[File:Cumulative distribution function for normal distribution, mean 0 and sd 1.svg|270px|thumb|right|Diagram showing the [[cumulative distribution function]] for the normal distribution with mean (''μ'') 0 and variance (''σ''<sup>2</sup>)&nbsp;1. In addition to the [[quantile function]], the prediction interval for any standard score can be calculated by (1&nbsp;&minus;&nbsp;(1&nbsp;&minus;&nbsp;<span style="font-size:100%;">Φ</span><sub>''μ'',''σ''<sup>2</sup></sub>(standard score))&sdot;2). For example, a standard score of ''x''&nbsp;=&nbsp;1.96 gives <span style="font-size:100%;">Φ</span><sub>''μ'',''σ''<sup>2</sup></sub>(1.96)&nbsp;=&nbsp;0.9750 corresponding to a prediction interval of (1&nbsp;&minus;&nbsp;(1&nbsp;&minus;&nbsp;0.9750)&sdot;2) =&nbsp;0.9500&nbsp;=&nbsp;95%.]]

=== Estimation of parameters ===
For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean <math>\overline{X}</math> as estimate for ''μ'' and the [[sample variance]] ''s''<sup>2</sup> as an estimate for ''σ''<sup>2</sup>. There are two natural choices for ''s''<sup>2</sup> here – dividing by <math>(n-1)</math> yields an unbiased estimate, while dividing by ''n'' yields the [[maximum likelihood estimator]], and either might be used. One then uses the quantile function with these estimated parameters <math>\Phi^{-1}_{\overline{X},s^2}</math> to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation<ref>{{Harvtxt|Geisser|1993|pp=[https://books.google.com/books?id=wfdlBZ_iwZoC 8–9]}}</ref> – it is not a predictive confidence interval.

For the sequel, use the sample mean:
:<math>\overline{X} = (X_1+\cdots+X_n)/n</math>
and the (unbiased) sample variance:
:<math>s^2 = {1 \over n-1}\sum_{i=1}^n (X_i-\overline{X})^2</math>

==== Unknown mean, known variance ====
Given<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7–]}}</ref> a normal distribution with unknown mean ''μ'' but known variance <math>\sigma^2</math>, the sample mean <math>\overline{X}</math> of the observations <math>X_1,\dots,X_n</math> has distribution <math>N(\mu,\sigma^2/n),</math> while the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math> Taking the difference of these cancels the ''μ'' and yields a normal distribution of variance <math>\sigma^2+(\sigma^2/n),</math> thus
:<math>\frac{X_{n+1}-\overline{X}}{\sqrt{\sigma^2+(\sigma^2/n)}} \sim N(0,1).</math>
Solving for <math>X_{n+1}</math> gives the prediction distribution <math>N(\overline{X},\sigma^2+(\sigma^2/n)),</math> from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100''p''%, then on repeated applications of this computation, the future observation <math>X_{n+1}</math> will fall in the predicted interval 100''p''% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean <math>\overline{X}</math> and known variance <math>\sigma^2</math>, as this uses compound variance <math>\sigma^2+(\sigma^2/n)</math>, hence yields slightly wider intervals. This is necessary for the desired confidence interval property to hold.

==== Known mean, unknown variance ====
Conversely, given a normal distribution with known mean ''μ'' but unknown variance <math>\sigma^2</math>, 
the sample variance <math>s^2</math> of the observations <math>X_1,\dots,X_n</math> has, up to scale, a [[chi-squared distribution|<math>\chi_{n-1}^2</math> distribution]]; more precisely:
:<math>\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2.</math>
On the other hand, the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math>
Taking the ratio of the future observation residual <math>X_{n+1}-\mu</math> and the sample standard deviation ''s'' cancels the ''σ,'' yielding a [[Student's t-distribution]] with ''n''&nbsp;–&nbsp;1 [[degrees of freedom (statistics)|degrees of freedom]] (see its [[Student%27s_t-distribution#Derivation|derivation]]):

: <math>\frac{X_{n+1}-\mu} s \sim T_{n-1}.</math>

Solving for <math>X_{n+1}</math> gives the prediction distribution <math>\mu \pm sT_{n-1},</math> from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation <math>s</math> and known mean ''μ'', as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

==== Unknown mean, unknown variance ====
Combining the above for a normal distribution <math>N(\mu,\sigma^2)</math> with both ''μ'' and ''σ''<sup>2</sup> unknown yields the following ancillary statistic:<ref>{{Harvtxt|Geisser|1993|loc=[https://books.google.com/books?id=wfdlBZ_iwZoC Example 2.2, p. 9–10]}}</ref>
:<math>\frac{X_{n+1}-\overline{X}}{s\sqrt{1+1/n}} \sim T_{n-1}</math>
This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for <math>X_{n+1}</math> yields the prediction distribution 
:<math>\overline{X} + s\sqrt{1+1/n} \cdot T_{n-1}</math>

The probability of <math>X_{n+1}</math> falling in a given interval is then:
:<math>\Pr\left(\overline{X}-T_{n-1,a} s\sqrt{1+(1/n)}\leq X_{n+1}   \leq\overline{X}+T_{n-1,a} s\sqrt{1+(1/n)}\,\right)=p</math>

where ''T<sub>n-1,a</sub>'' is the 100((1&nbsp;−&nbsp;''p'')/2)<sup>th</sup> [[percentile]] of [[Student's t-distribution]] with ''n''&nbsp;&minus;&nbsp;1 degrees of freedom.  Therefore, the numbers

:<math>\overline{X} \pm T_{n-1,a} s \sqrt{1+(1/n)}</math>

are the endpoints of a 100(1&nbsp;−&nbsp;''p'')% prediction interval for <math>X_{n+1}</math>.

== Non-parametric methods ==
One can compute prediction intervals without any assumptions on the population, i.e. in a [[non-parametric statistics|non-parametric]] way.

The residual [[Bootstrapping (statistics)|bootstrap]] method can be used for constructing non-parametric prediction intervals.

=== Conformal Prediction ===
{{Main article|conformal prediction}}
In general the conformal prediction method is more general.
Let us look at the special case of using the minimum and maximum as boundaries for a prediction interval:
If one has a sample of identical random variables {''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>}, then the probability that the next observation ''X''<sub>''n''+1</sub> will be the largest is 1/(''n''&nbsp;+&nbsp;1), since all observations have equal probability of being the maximum. In the same way, the probability that ''X''<sub>''n''+1</sub> will be the smallest is 1/(''n''&nbsp;+&nbsp;1). The other (''n''&nbsp;&minus;&nbsp;1)/(''n''&nbsp;+&nbsp;1) of the time, ''X''<sub>''n''+1</sub> falls between the [[sample maximum]] and [[sample minimum]] of the sample {''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub>}. Thus, denoting the sample maximum and minimum by ''M'' and ''m,'' this yields an (''n''&nbsp;&minus;&nbsp;1)/(''n''&nbsp;+&nbsp;1) prediction interval of [''m'',&nbsp;''M''].{{citation needed|reason=are we sure it's (n-1)/n+1) and not (n-2)/n+1)|date=January 2025}}

Notice that while this gives the probability that a future observation will fall in a range, it does not give any estimate as to where in a segment it will fall – notably, if it falls outside the range of observed values, it may be far outside the range. See [[extreme value theory]] for further discussion. Formally, this applies not just to sampling from a population, but to any [[exchangeable sequence]] of random variables, not necessarily independent or [[identically distributed]].

== Contrast with other intervals ==
{{Main article|Interval estimation}}
{{see also|Tolerance interval|Quantile regression}}

=== Contrast with confidence intervals ===
{{main article|Confidence interval}}
In the formula for the predictive confidence interval ''no mention'' is made of the unobservable parameters ''μ'' and ''σ'' of population mean and standard deviation – the observed ''sample'' statistics <math>\overline{X}_n</math> and <math>S_n</math> of sample mean and standard deviation are used, and what is estimated is the outcome of ''future'' samples.

When considering prediction intervals, rather than using sample statistics as estimators of population parameters and applying confidence intervals to these estimates, one considers "the next sample" <math>X_{n+1}</math> as ''itself'' a statistic, and computes its [[sampling distribution]].

In parameter confidence intervals, one estimates population parameters; if one wishes to interpret this as prediction of the next sample, one models "the next sample" as a draw from this estimated population, using the (estimated) ''population'' distribution. By contrast, in predictive confidence intervals, one uses the ''sampling'' distribution of (a statistic of) a sample of ''n'' or ''n''&nbsp;+&nbsp;1 observations from such a population, and the population distribution is not directly used, though the assumption about its form (though not the values of its parameters) is used in computing the sampling distribution.

== In regression analysis ==
{{further|Regression analysis#Prediction (interpolation and extrapolation)|Mean and predicted outcome}}

A common application of prediction intervals is to [[regression analysis]].
Suppose the data is being modeled by a straight line ([[simple linear regression]]):
:<math>y_i=\alpha+\beta x_i +\varepsilon_i\,</math>
where <math>y_i</math> is the [[response variable]], <math>x_i</math> is the [[explanatory variable]], ''ε<sub>i</sub>'' is a random error term, and <math>\alpha</math> and <math>\beta</math> are parameters.

Given estimates <math>\hat \alpha</math> and <math>\hat \beta</math> for the parameters, such as from a [[ordinary least squares]], the predicted response value ''y''<sub>''d''</sub> for a given explanatory value ''x''<sub>''d''</sub> is

:<math>\hat{y}_d=\hat\alpha+\hat\beta x_d ,</math>

(the point on the regression line), while the actual response would be

:<math>y_d=\alpha+\beta x_d +\varepsilon_d.  \,</math>

The [[point estimate]] <math>\hat{y}_d</math> is called the ''[[mean response]]'', and is an estimate of the [[expected value]] of ''y''<sub>''d''</sub>, <math>E(y\mid x_d).</math>

A prediction interval instead gives an interval in which one expects ''y''<sub>''d''</sub> to fall; this is not necessary if the actual parameters ''α'' and ''β'' are known (together with the error term ''ε<sub>i</sub>''), but if one is estimating from a [[Sampling (statistics)|sample]], then one may use the [[standard error]] of the estimates for the intercept and slope (<math>\hat\alpha</math> and <math>\hat\beta</math>), as well as their correlation, to compute a prediction interval.

In regression, {{Harvtxt|Faraway|2002|p=39}} makes a distinction between intervals for predictions of the mean response vs. for predictions of observed response—affecting essentially the inclusion or not of the unity term within the square root in the expansion factors [[#Unknown mean, unknown variance|above]]; for details, see {{Harvtxt|Faraway|2002}}.

== Bayesian statistics ==
{{see also|Posterior predictive distribution}}
[[Seymour Geisser]], a proponent of predictive inference, gives predictive applications of [[Bayesian statistics]].<ref>{{Harvtxt|Geisser|1993}}</ref>

In Bayesian statistics, one can compute (Bayesian) prediction intervals from the [[posterior probability]] of the random variable, as a [[credible interval]]. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself. However, particularly where applications are concerned with possible extreme values of yet to be observed cases, credible intervals for such values can be of practical importance.

==Applications==
Prediction intervals are commonly used as definitions of [[reference range]]s, such as [[reference ranges for blood tests]] to give an idea of whether a [[blood test]] is normal or not. For this purpose, the most commonly used prediction interval is the 95% prediction interval, and a reference range based on it can be called a ''standard reference range''.

==See also==
{{colbegin}}
*[[Extrapolation]]
*[[Posterior probability]]
*[[Prediction]]
*[[Confidence and prediction bands|Prediction band]]
*[[Seymour Geisser]]
*[[Statistical model validation]]
*[[Trend estimation]]
{{colend}}

==Notes==
{{reflist}}

==References==
* {{citation | first= Julian J. | last= Faraway | year= 2002 | url=https://cran.r-project.org/doc/contrib/Faraway-PRA.pdf | title= Practical Regression and Anova using R}}
* {{citation|author-first=Seymour | author-last= Geisser | author-link= Seymour Geisser |year=1993|title=Predictive Inference|publisher= [[CRC Press]]}}
* {{citation |last1=Sterne |first1=Jonathan |last2=Kirkwood |first2=Betty R. |title=Essential Medical Statistics |publisher=[[Blackwell Science]] |year=2003 |isbn=0-86542-871-9 |url-access=registration |url=https://archive.org/details/essentialmedical00kirk }}

==Further reading==
*{{cite journal | doi = 10.2307/1391361 | last1 = Chatfield | first1 = C. | year = 1993 | title = Calculating Interval Forecasts | journal = [[Journal of Business & Economic Statistics]] | volume = 11 | issue = 2| pages = 121–135 | jstor = 1391361 }}
*{{cite journal | doi = 10.1093/biomet/92.3.529 | last1 = Lawless | first1 = J. F. | last2 = Fredette | first2 = M. | year = 2005 | title = Frequentist prediction intervals and predictive distributions | journal = [[Biometrika]] | volume = 92 | issue = 3| pages = 529–542 | doi-access = free }}
*{{cite journal | last1 = Meade | first1 = N. | last2 = Islam | first2 = T. | year = 1995 | title = Prediction Intervals for Growth Curve Forecasts | doi =10.1002/for.3980140502| journal = [[Journal of Forecasting]] | volume = 14 | issue = 5| pages = 413–430 }}
* ISO 16269-8 Standard Interpretation of Data, Part 8, Determination of Prediction Intervals

{{statistics|inference|collapsed}}

{{DEFAULTSORT:Prediction Interval}}
[[Category:Statistical forecasting]]
[[Category:Regression analysis]]
[[Category:Statistical intervals]]