Editing Student's t-distribution (section)

==Occurrence and applications==
===In frequentist statistical inference===
Student's {{mvar|t}}&nbsp;distribution arises in a variety of statistical estimation problems where the goal is to estimate an unknown parameter, such as a mean value, in a setting where the data are observed with additive [[errors and residuals in statistics|errors]]. If (as in nearly all practical statistical work) the population [[standard deviation]] of these errors is unknown and has to be estimated from the data, the {{mvar|t}}&nbsp;distribution is often used to account for the extra uncertainty that results from this estimation. In most such problems, if the standard deviation of the errors were known, a normal distribution would be used instead of the {{mvar|t}}&nbsp;distribution.

[[Confidence interval]]s and [[hypothesis test]]s are two statistical procedures in which the [[quantile]]s of the sampling distribution of a particular statistic (e.g. the [[standard score]]) are required. In any situation where this statistic is a [[linear function]] of the [[data]], divided by the usual estimate of the standard deviation, the resulting quantity can be rescaled and centered to follow Student's {{mvar|t}}&nbsp;distribution. Statistical analyses involving means, weighted means, and regression coefficients all lead to statistics having this form.

Quite often, textbook problems will treat the population standard deviation as if it were known and thereby avoid the need to use the Student's {{mvar|t}}&nbsp;distribution. These problems are generally of two kinds: (1) those in which the sample size is so large that one may treat a data-based estimate of the [[variance]] as if it were certain, and (2) those that illustrate mathematical reasoning, in which the problem of estimating the standard deviation is temporarily ignored because that is not the point that the author or instructor is then explaining.

====Hypothesis testing====
A number of statistics can be shown to have {{mvar|t}}&nbsp;distributions for samples of moderate size under [[null hypothesis|null hypotheses]] that are of interest, so that the {{mvar|t}}&nbsp;distribution forms the basis for significance tests. For example, the distribution of [[Spearman's rank correlation coefficient]] {{mvar|ρ}}, in the null case (zero correlation) is well approximated by the {{mvar|t}} distribution for sample sizes above about 20.{{citation needed|date=November 2010}}

====Confidence intervals====
Suppose the number ''A'' is so chosen that

:<math>\ \operatorname{\mathbb P}\left\{\ -A < T < A\ \right\} = 0.9\ ,</math>

when {{mvar|T}} has a {{mvar|t}}&nbsp;distribution with {{nobr|{{math|''n'' − 1}} &thinsp;}} degrees of freedom. By symmetry, this is the same as saying that {{mvar|A}} satisfies

:<math>\ \operatorname{\mathbb P}\left\{\ T < A\ \right\} = 0.95\ ,</math>

so ''A'' is the "95th percentile" of this probability distribution, or <math>\ A = t_{(0.05,n-1)} ~.</math> Then

:<math>\ \operatorname{\mathbb P}\left\{\ -A < \frac{\ \overline{X}_n - \mu\ }{ S_n/\sqrt{n\ } } < A\ \right\} = 0.9\ ,</math>

where {{nobr|''S''{{sub|''n''}} }} is the sample standard deviation of the observed values. This is equivalent to

:<math>\ \operatorname{\mathbb P}\left\{\ \overline{X}_n - A \frac{ S_n }{\ \sqrt{n\ }\ } < \mu < \overline{X}_n + A\ \frac{ S_n }{\ \sqrt{n\ }\ }\ \right\} = 0.9.</math>

Therefore, the interval whose endpoints are

:<math>\ \overline{X}_n\ \pm A\ \frac{ S_n }{\ \sqrt{n\ }\ }\ </math>

is a 90% [[confidence interval]] for μ. Therefore, if we find the mean of a set of observations that we can reasonably expect to have a normal distribution, we can use the {{mvar|t}}&nbsp;distribution to examine whether the confidence limits on that mean include some theoretically predicted value – such as the value predicted on a [[null hypothesis]].

It is this result that is used in the [[Student's t-test|Student's {{mvar|t}}&nbsp;test]]s: since the difference between the means of samples from two normal distributions is itself distributed normally, the {{mvar|t}}&nbsp;distribution can be used to examine whether that difference can reasonably be supposed to be zero.

If the data are normally distributed, the one-sided {{nobr|{{math|(1 − ''α'')}} upper}} confidence limit (UCL) of the mean, can be calculated using the following equation:

:<math>\mathsf{UCL}_{1-\alpha} = \overline{X}_n + t_{\alpha,n-1}\ \frac{ S_n }{\ \sqrt{n\ }\ } ~.</math>
 
The resulting UCL will be the greatest average value that will occur for a given confidence interval and population size. In other words, <math>\overline{X}_n</math> being the mean of the set of observations, the probability that the mean of the distribution is inferior to {{nobr|UCL{{sub|{{math|1 − ''α''}} }} }} is equal to the confidence {{nobr|level {{math|1 − ''α''}} .}}

====Prediction intervals====
The {{mvar|t}}&nbsp;distribution can be used to construct a [[prediction interval]] for an unobserved sample from a normal distribution with unknown mean and variance.

===In Bayesian statistics===
The Student's {{mvar|t}}&nbsp;distribution, especially in its three-parameter (location-scale) version, arises frequently in [[Bayesian statistics]] as a result of its connection with the normal distribution. Whenever the [[variance]] of a normally distributed [[random variable]] is unknown and a [[conjugate prior]] placed over it that follows an [[inverse gamma distribution]], the resulting [[marginal distribution]] of the variable will follow a Student's {{mvar|t}}&nbsp;distribution. Equivalent constructions with the same results involve a conjugate [[scaled-inverse-chi-squared distribution]] over the variance, or a conjugate gamma distribution over the [[Precision (statistics)|precision]]. If an [[improper prior]] proportional to {{sfrac| 1 | {{mvar|σ}}² }} is placed over the variance, the {{mvar|t}}&nbsp;distribution also arises. This is the case regardless of whether the mean of the normally distributed variable is known, is unknown distributed according to a [[conjugate prior|conjugate]] normally distributed prior, or is unknown distributed according to an improper constant prior.

Related situations that also produce a {{mvar|t}}&nbsp;distribution are:
* The [[marginal distribution|marginal]] [[posterior distribution]] of the unknown mean of a normally distributed variable, with unknown prior mean and variance following the above model.
* The [[prior predictive distribution]] and [[posterior predictive distribution]] of a new normally distributed data point when a series of [[independent identically distributed]] normally distributed data points have been observed, with prior mean and variance as in the above model.

===Robust parametric modeling===
The {{mvar|t}}&nbsp;distribution is often used as an alternative to the normal distribution as a model for data, which often has heavier tails than the normal distribution allows for; see e.g. Lange et al.<ref>{{cite journal|vauthors=Lange KL, Little RJ, Taylor JM|date=1989|title=Robust Statistical Modeling Using the {{mvar|t}} Distribution|url=https://cloudfront.escholarship.org/dist/prd/content/qt27s1d3h7/qt27s1d3h7.pdf|journal=[[Journal of the American Statistical Association]]|volume=84|issue=408|pages=881–896|doi=10.1080/01621459.1989.10478852|jstor=2290063}}</ref> The classical approach was to identify [[outlier (statistics)|outliers]] (e.g., using [[Grubbs's test]]) and exclude or downweight them in some way. However, it is not always easy to identify outliers (especially in [[curse of dimensionality|high dimensions]]), and the {{mvar|t}}&nbsp;distribution is a natural choice of model for such data and provides a parametric approach to [[robust statistics]].

A Bayesian account can be found in Gelman et al.<ref>{{cite book|title=Bayesian Data Analysis|vauthors=Gelman AB, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB|publisher=CRC Press|year=2014|isbn=9781439898208|location=Boca Raton, Florida|pages=293|chapter=Computationally efficient Markov chain simulation|display-authors=3}}</ref> The degrees of freedom parameter controls the kurtosis of the distribution and is correlated with the scale parameter. The likelihood can have multiple local maxima and, as such, it is often necessary to fix the degrees of freedom at a fairly low value and estimate the other parameters taking this as given. Some authors{{Citation needed|date=June 2015}} report that values between 3 and 9 are often good choices. Venables and Ripley{{Citation needed|date=June 2015}} suggest that a value of 5 is often a good choice.

===Student's {{mvar|t}}&nbsp;process===
For practical [[Regression analysis|regression]] and [[prediction]] needs, Student's {{mvar|t}}&nbsp;processes were introduced, that are generalisations of the Student {{mvar|t}}&nbsp;distributions for functions. A Student's {{mvar|t}}&nbsp;process is constructed from the Student {{mvar|t}}&nbsp;distributions like a [[Gaussian process]] is constructed from the [[Multivariate normal distribution|Gaussian distributions]]. For a [[Gaussian process]], all sets of values have a multidimensional Gaussian distribution. Analogously, <math>X(t)</math> is a Student {{mvar|t}}&nbsp;process on an interval <math>I=[a,b]</math> if the correspondent values of the process <math>\ X(t_1),\ \ldots\ , X(t_n)\ </math> (<math>t_i \in I</math>) have a joint [[Multivariate t-distribution|multivariate Student {{mvar|t}}&nbsp;distribution]].<ref name="Shah2014">{{cite journal |last1= Shah| first1= Amar |last2= Wilson| first2= Andrew Gordon|last3= Ghahramani|first3= Zoubin|year= 2014 |title= Student&nbsp;{{mvar|t}} processes as alternatives to Gaussian processes|journal= JMLR|volume= 33|issue= Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland|pages= 877–885| arxiv= 1402.4306 | url= http://proceedings.mlr.press/v33/shah14.pdf}}</ref> These processes are used for regression, prediction, Bayesian optimization and related problems. For multivariate regression and multi-output prediction, the multivariate Student {{mvar|t}}&nbsp;processes are introduced and used.<ref name="Zexun2020">{{cite journal |last1= Chen| first1= Zexun |last2= Wang| first2= Bo|last3= Gorban|first3=Alexander N.|year= 2019 |title= Multivariate Gaussian and Student&nbsp;{{mvar|t}} process regression for multi-output prediction|journal= Neural Computing and Applications| volume= 32 | issue= 8 | pages= 3005–3028 |doi=10.1007/s00521-019-04687-8|doi-access= free| arxiv= 1703.04455 }}</ref>

===Table of selected values===
The following table lists values for {{mvar|t}}&nbsp;distributions with {{mvar|ν}} degrees of freedom for a range of one-sided or two-sided critical regions. The first column is {{mvar|ν}}, the percentages along the top are confidence levels <math>\ \alpha\ ,</math> and the numbers in the body of the table are the <math>t_{\alpha,n-1}</math> factors described in the section on [[#Confidence intervals|confidence intervals]].

The last row with infinite {{mvar|ν}} gives critical points for a normal distribution since a {{mvar|t}}&nbsp;distribution with infinitely many degrees of freedom is a normal distribution. (See [[#Related distributions|Related distributions]] above).

{| class="wikitable"
|-
! ''One-sided''
! 75%
! 80%
! 85%
! 90%
! 95%
! 97.5%
! 99%
! 99.5%
! 99.75%
! 99.9%
! 99.95%
|-
! ''Two-sided''
! 50%
! 60%
! 70%
! 80%
! 90%
! 95%
! 98%
! 99%
! 99.5%
! 99.8%
! 99.9%
|-
!1
|1.000
|1.376
|1.963
|3.078
|6.314
|12.706
|31.821
|63.657
|127.321
|318.309
|636.619
|-
!2
|0.816
|1.061
|1.386
|1.886
|2.920
|4.303
|6.965
|9.925
|14.089
|22.327
|31.599
|-
!3
|0.765
|0.978
|1.250
|1.638
|2.353
|3.182
|4.541
|5.841
|7.453
|10.215
|12.924
|-
!4
|0.741
|0.941
|1.190
|1.533
|2.132
|2.776
|3.747
|4.604
|5.598
|7.173
|8.610
|-
!5
|0.727
|0.920
|1.156
|1.476
|2.015
|2.571
|3.365
|4.032
|4.773
|5.893
|6.869
|-
!6
|0.718
|0.906
|1.134
|1.440
|1.943
|2.447
|3.143
|3.707
|4.317
|5.208
|5.959
|-
!7
|0.711
|0.896
|1.119
|1.415
|1.895
|2.365
|2.998
|3.499
|4.029
|4.785
|5.408
|-
!8
|0.706
|0.889
|1.108
|1.397
|1.860
|2.306
|2.896
|3.355
|3.833
|4.501
|5.041
|-
!9
|0.703
|0.883
|1.100
|1.383
|1.833
|2.262
|2.821
|3.250
|3.690
|4.297
|4.781
|-
!10
|0.700
|0.879
|1.093
|1.372
|1.812
|2.228
|2.764
|3.169
|3.581
|4.144
|4.587
|-
!11
|0.697
|0.876
|1.088
|1.363
|1.796
|2.201
|2.718
|3.106
|3.497
|4.025
|4.437
|-
!12
|0.695
|0.873
|1.083
|1.356
|1.782
|2.179
|2.681
|3.055
|3.428
|3.930
|4.318
|-
!13
|0.694
|0.870
|1.079
|1.350
|1.771
|2.160
|2.650
|3.012
|3.372
|3.852
|4.221
|-
!14
|0.692
|0.868
|1.076
|1.345
|1.761
|2.145
|2.624
|2.977
|3.326
|3.787
|4.140
|-
!15
|0.691
|0.866
|1.074
|1.341
|1.753
|2.131
|2.602
|2.947
|3.286
|3.733
|4.073
|-
!16
|0.690
|0.865
|1.071
|1.337
|1.746
|2.120
|2.583
|2.921
|3.252
|3.686
|4.015
|-
!17
|0.689
|0.863
|1.069
|1.333
|1.740
|2.110
|2.567
|2.898
|3.222
|3.646
|3.965
|-
!18
|0.688
|0.862
|1.067
|1.330
|1.734
|2.101
|2.552
|2.878
|3.197
|3.610
|3.922
|-
!19
|0.688
|0.861
|1.066
|1.328
|1.729
|2.093
|2.539
|2.861
|3.174
|3.579
|3.883
|-
!20
|0.687
|0.860
|1.064
|1.325
|1.725
|2.086
|2.528
|2.845
|3.153
|3.552
|3.850
|-
!21
|0.686
|0.859
|1.063
|1.323
|1.721
|2.080
|2.518
|2.831
|3.135
|3.527
|3.819
|-
!22
|0.686
|0.858
|1.061
|1.321
|1.717
|2.074
|2.508
|2.819
|3.119
|3.505
|3.792
|-
!23
|0.685
|0.858
|1.060
|1.319
|1.714
|2.069
|2.500
|2.807
|3.104
|3.485
|3.767
|-
!24
|0.685
|0.857
|1.059
|1.318
|1.711
|2.064
|2.492
|2.797
|3.091
|3.467
|3.745
|-
!25
|0.684
|0.856
|1.058
|1.316
|1.708
|2.060
|2.485
|2.787
|3.078
|3.450
|3.725
|-
!26
|0.684
|0.856
|1.058
|1.315
|1.706
|2.056
|2.479
|2.779
|3.067
|3.435
|3.707
|-
!27
|0.684
|0.855
|1.057
|1.314
|1.703
|2.052
|2.473
|2.771
|3.057
|3.421
|3.690
|-
!28
|0.683
|0.855
|1.056
|1.313
|1.701
|2.048
|2.467
|2.763
|3.047
|3.408
|3.674
|-
!29
|0.683
|0.854
|1.055
|1.311
|1.699
|2.045
|2.462
|2.756
|3.038
|3.396
|3.659
|-
!30
|0.683
|0.854
|1.055
|1.310
|1.697
|2.042
|2.457
|2.750
|3.030
|3.385
|3.646
|-
!40
|0.681
|0.851
|1.050
|1.303
|1.684
|2.021
|2.423
|2.704
|2.971
|3.307
|3.551
|-
!50
|0.679
|0.849
|1.047
|1.299
|1.676
|2.009
|2.403
|2.678
|2.937
|3.261
|3.496
|-
!60
|0.679
|0.848
|1.045
|1.296
|1.671
|2.000
|2.390
|2.660
|2.915
|3.232
|3.460
|-
!80
|0.678
|0.846
|1.043
|1.292
|1.664
|1.990
|2.374
|2.639
|2.887
|3.195
|3.416
|-
!100
|0.677
|0.845
|1.042
|1.290
|1.660
|1.984
|2.364
|2.626
|2.871
|3.174
|3.390
|-
!120
|0.677
|0.845
|1.041
|1.289
|1.658
|1.980
|2.358
|2.617
|2.860
|3.160
|3.373
|-
!∞
|0.674
|0.842
|1.036
|1.282
|1.645
|1.960
|2.326
|2.576
|2.807
|3.090
|3.291
|-
! ''One-sided''
! 75%
! 80%
! 85%
! 90%
! 95%
! 97.5%
! 99%
! 99.5%
! 99.75%
! 99.9%
! 99.95%
|-
! ''Two-sided''
! 50%
! 60%
! 70%
! 80%
! 90%
! 95%
! 98%
! 99%
! 99.5%
! 99.8%
! 99.9%
|}

; Calculating the confidence interval :

Let's say we have a sample with size&nbsp;11, sample mean&nbsp;10, and sample variance&nbsp;2. For 90% confidence with 10&nbsp;degrees of freedom, the one-sided {{mvar|t}}&nbsp;value from the table is 1.372&nbsp;. Then with confidence interval calculated from

:<math>\ \overline{X}_n \pm t_{\alpha,\nu}\ \frac{S_n}{\ \sqrt{n\ }\ }\ ,</math>

we determine that with 90% confidence we have a true mean lying below

:<math>\ 10 + 1.372\ \frac{ \sqrt{2\ } }{\ \sqrt{11\ }\ } = 10.585 ~.</math>

In other words, 90% of the times that an upper threshold is calculated by this method from particular samples, this upper threshold exceeds the true mean.

And with 90% confidence we have a true mean lying above

:<math>\ 10 - 1.372\ \frac{ \sqrt{2\ } }{\ \sqrt{11\ }\ } = 9.414 ~.</math>

In other words, 90% of the times that a lower threshold is calculated by this method from particular samples, this lower threshold lies below the true mean.

So that at 80% confidence (calculated from 100%&nbsp;−&nbsp;2&nbsp;×&nbsp;(1&nbsp;−&nbsp;90%) = 80%), we have a true mean lying within the interval

:<math>\left(\ 10 - 1.372\ \frac{ \sqrt{2\ } }{\ \sqrt{11\ }\ },\ 10 + 1.372\ \frac{ \sqrt{2\ } }{\ \sqrt{11\ }\ }\ \right) = (\ 9.414,\ 10.585\ ) ~.</math>

Saying that 80% of the times that upper and lower thresholds are calculated by this method from a given sample, the true mean is both below the upper threshold and above the lower threshold is not the same as saying that there is an 80% probability that the true mean lies between a particular pair of upper and lower thresholds that have been calculated by this method; see [[confidence interval]] and [[prosecutor's fallacy]].

Nowadays, statistical software, such as the [[R (programming language)|R programming language]], and functions available in many [[Spreadsheet|spreadsheet programs]] compute values of the {{mvar|t}}&nbsp;distribution and its inverse without tables.