Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Student's t-distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===In frequentist statistical inference=== Student's {{mvar|t}} distribution arises in a variety of statistical estimation problems where the goal is to estimate an unknown parameter, such as a mean value, in a setting where the data are observed with additive [[errors and residuals in statistics|errors]]. If (as in nearly all practical statistical work) the population [[standard deviation]] of these errors is unknown and has to be estimated from the data, the {{mvar|t}} distribution is often used to account for the extra uncertainty that results from this estimation. In most such problems, if the standard deviation of the errors were known, a normal distribution would be used instead of the {{mvar|t}} distribution. [[Confidence interval]]s and [[hypothesis test]]s are two statistical procedures in which the [[quantile]]s of the sampling distribution of a particular statistic (e.g. the [[standard score]]) are required. In any situation where this statistic is a [[linear function]] of the [[data]], divided by the usual estimate of the standard deviation, the resulting quantity can be rescaled and centered to follow Student's {{mvar|t}} distribution. Statistical analyses involving means, weighted means, and regression coefficients all lead to statistics having this form. Quite often, textbook problems will treat the population standard deviation as if it were known and thereby avoid the need to use the Student's {{mvar|t}} distribution. These problems are generally of two kinds: (1) those in which the sample size is so large that one may treat a data-based estimate of the [[variance]] as if it were certain, and (2) those that illustrate mathematical reasoning, in which the problem of estimating the standard deviation is temporarily ignored because that is not the point that the author or instructor is then explaining. ====Hypothesis testing==== A number of statistics can be shown to have {{mvar|t}} distributions for samples of moderate size under [[null hypothesis|null hypotheses]] that are of interest, so that the {{mvar|t}} distribution forms the basis for significance tests. For example, the distribution of [[Spearman's rank correlation coefficient]] {{mvar|Ο}}, in the null case (zero correlation) is well approximated by the {{mvar|t}} distribution for sample sizes above about 20.{{citation needed|date=November 2010}} ====Confidence intervals==== Suppose the number ''A'' is so chosen that :<math>\ \operatorname{\mathbb P}\left\{\ -A < T < A\ \right\} = 0.9\ ,</math> when {{mvar|T}} has a {{mvar|t}} distribution with {{nobr|{{math|''n'' β 1}}  }} degrees of freedom. By symmetry, this is the same as saying that {{mvar|A}} satisfies :<math>\ \operatorname{\mathbb P}\left\{\ T < A\ \right\} = 0.95\ ,</math> so ''A'' is the "95th percentile" of this probability distribution, or <math>\ A = t_{(0.05,n-1)} ~.</math> Then :<math>\ \operatorname{\mathbb P}\left\{\ -A < \frac{\ \overline{X}_n - \mu\ }{ S_n/\sqrt{n\ } } < A\ \right\} = 0.9\ ,</math> where {{nobr|''S''{{sub|''n''}} }} is the sample standard deviation of the observed values. This is equivalent to :<math>\ \operatorname{\mathbb P}\left\{\ \overline{X}_n - A \frac{ S_n }{\ \sqrt{n\ }\ } < \mu < \overline{X}_n + A\ \frac{ S_n }{\ \sqrt{n\ }\ }\ \right\} = 0.9.</math> Therefore, the interval whose endpoints are :<math>\ \overline{X}_n\ \pm A\ \frac{ S_n }{\ \sqrt{n\ }\ }\ </math> is a 90% [[confidence interval]] for ΞΌ. Therefore, if we find the mean of a set of observations that we can reasonably expect to have a normal distribution, we can use the {{mvar|t}} distribution to examine whether the confidence limits on that mean include some theoretically predicted value β such as the value predicted on a [[null hypothesis]]. It is this result that is used in the [[Student's t-test|Student's {{mvar|t}} test]]s: since the difference between the means of samples from two normal distributions is itself distributed normally, the {{mvar|t}} distribution can be used to examine whether that difference can reasonably be supposed to be zero. If the data are normally distributed, the one-sided {{nobr|{{math|(1 β ''Ξ±'')}} upper}} confidence limit (UCL) of the mean, can be calculated using the following equation: :<math>\mathsf{UCL}_{1-\alpha} = \overline{X}_n + t_{\alpha,n-1}\ \frac{ S_n }{\ \sqrt{n\ }\ } ~.</math> The resulting UCL will be the greatest average value that will occur for a given confidence interval and population size. In other words, <math>\overline{X}_n</math> being the mean of the set of observations, the probability that the mean of the distribution is inferior to {{nobr|UCL{{sub|{{math|1 β ''Ξ±''}} }} }} is equal to the confidence {{nobr|level {{math|1 β ''Ξ±''}} .}} ====Prediction intervals==== The {{mvar|t}} distribution can be used to construct a [[prediction interval]] for an unobserved sample from a normal distribution with unknown mean and variance.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)