Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Prediction interval
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Estimation of parameters === For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function β for example, one could use the sample mean <math>\overline{X}</math> as estimate for ''ΞΌ'' and the [[sample variance]] ''s''<sup>2</sup> as an estimate for ''Ο''<sup>2</sup>. There are two natural choices for ''s''<sup>2</sup> here β dividing by <math>(n-1)</math> yields an unbiased estimate, while dividing by ''n'' yields the [[maximum likelihood estimator]], and either might be used. One then uses the quantile function with these estimated parameters <math>\Phi^{-1}_{\overline{X},s^2}</math> to give a prediction interval. This approach is usable, but the resulting interval will not have the repeated sampling interpretation<ref>{{Harvtxt|Geisser|1993|pp=[https://books.google.com/books?id=wfdlBZ_iwZoC 8β9]}}</ref> β it is not a predictive confidence interval. For the sequel, use the sample mean: :<math>\overline{X} = (X_1+\cdots+X_n)/n</math> and the (unbiased) sample variance: :<math>s^2 = {1 \over n-1}\sum_{i=1}^n (X_i-\overline{X})^2</math> ==== Unknown mean, known variance ==== Given<ref>{{Harvtxt|Geisser|1993|p=[https://books.google.com/books?id=wfdlBZ_iwZoC 7β]}}</ref> a normal distribution with unknown mean ''ΞΌ'' but known variance <math>\sigma^2</math>, the sample mean <math>\overline{X}</math> of the observations <math>X_1,\dots,X_n</math> has distribution <math>N(\mu,\sigma^2/n),</math> while the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math> Taking the difference of these cancels the ''ΞΌ'' and yields a normal distribution of variance <math>\sigma^2+(\sigma^2/n),</math> thus :<math>\frac{X_{n+1}-\overline{X}}{\sqrt{\sigma^2+(\sigma^2/n)}} \sim N(0,1).</math> Solving for <math>X_{n+1}</math> gives the prediction distribution <math>N(\overline{X},\sigma^2+(\sigma^2/n)),</math> from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100''p''%, then on repeated applications of this computation, the future observation <math>X_{n+1}</math> will fall in the predicted interval 100''p''% of the time. Notice that this prediction distribution is more conservative than using the estimated mean <math>\overline{X}</math> and known variance <math>\sigma^2</math>, as this uses compound variance <math>\sigma^2+(\sigma^2/n)</math>, hence yields slightly wider intervals. This is necessary for the desired confidence interval property to hold. ==== Known mean, unknown variance ==== Conversely, given a normal distribution with known mean ''ΞΌ'' but unknown variance <math>\sigma^2</math>, the sample variance <math>s^2</math> of the observations <math>X_1,\dots,X_n</math> has, up to scale, a [[chi-squared distribution|<math>\chi_{n-1}^2</math> distribution]]; more precisely: :<math>\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2.</math> On the other hand, the future observation <math>X_{n+1}</math> has distribution <math>N(\mu,\sigma^2).</math> Taking the ratio of the future observation residual <math>X_{n+1}-\mu</math> and the sample standard deviation ''s'' cancels the ''Ο,'' yielding a [[Student's t-distribution]] with ''n'' β 1 [[degrees of freedom (statistics)|degrees of freedom]] (see its [[Student%27s_t-distribution#Derivation|derivation]]): : <math>\frac{X_{n+1}-\mu} s \sim T_{n-1}.</math> Solving for <math>X_{n+1}</math> gives the prediction distribution <math>\mu \pm sT_{n-1},</math> from which one can compute intervals as before. Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation <math>s</math> and known mean ''ΞΌ'', as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold. ==== Unknown mean, unknown variance ==== Combining the above for a normal distribution <math>N(\mu,\sigma^2)</math> with both ''ΞΌ'' and ''Ο''<sup>2</sup> unknown yields the following ancillary statistic:<ref>{{Harvtxt|Geisser|1993|loc=[https://books.google.com/books?id=wfdlBZ_iwZoC Example 2.2, p. 9β10]}}</ref> :<math>\frac{X_{n+1}-\overline{X}}{s\sqrt{1+1/n}} \sim T_{n-1}</math> This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution. Solving for <math>X_{n+1}</math> yields the prediction distribution :<math>\overline{X} + s\sqrt{1+1/n} \cdot T_{n-1}</math> The probability of <math>X_{n+1}</math> falling in a given interval is then: :<math>\Pr\left(\overline{X}-T_{n-1,a} s\sqrt{1+(1/n)}\leq X_{n+1} \leq\overline{X}+T_{n-1,a} s\sqrt{1+(1/n)}\,\right)=p</math> where ''T<sub>n-1,a</sub>'' is the 100((1 β ''p'')/2)<sup>th</sup> [[percentile]] of [[Student's t-distribution]] with ''n'' − 1 degrees of freedom. Therefore, the numbers :<math>\overline{X} \pm T_{n-1,a} s \sqrt{1+(1/n)}</math> are the endpoints of a 100(1 β ''p'')% prediction interval for <math>X_{n+1}</math>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)