Editing Student's t-distribution (section)

==Properties==
===Moments===
For <math>\nu > 1\ ,</math> the [[raw moment]]s of the {{mvar|t}}&nbsp;distribution are

:<math>\operatorname{\mathbb E}\left\{\ T^k\ \right\} = \begin{cases}
\quad 0 & k \text{ odd }, \quad 0 < k < \nu\ , \\ {} \\
\frac{1}{\ \sqrt{\pi\ }\ \Gamma\left(\frac{\ \nu\ }{ 2 }\right)}\ \left[\ \Gamma\!\left(\frac{\ k + 1\ }{ 2 }\right)\ \Gamma\!\left(\frac{\ \nu - k\ }{ 2 }\right)\ \nu^{\frac{\ k\ }{ 2 }}\ \right] & k \text{ even }, \quad 0 < k < \nu ~.\\
\end{cases}</math>

Moments of order <math>\ \nu\ </math> or higher do not exist.<ref>{{cite book |vauthors=Casella G, Berger RL |year=1990 |title=Statistical Inference |publisher=Duxbury Resource Center |isbn=9780534119584 |page =56}}</ref>

The term for <math>\ 0 < k < \nu\ ,</math> {{mvar|k}} even, may be simplified using the properties of the [[gamma function]] to

:<math>\operatorname{\mathbb E}\left\{\ T^k\ \right\} = \nu^{ \frac{\ k\ }{ 2 } }\ \prod_{j=1}^{k/2}\ \frac{~ 2j - 1 ~}{ \nu - 2j } \qquad k \text{ even}, \quad 0 < k < \nu ~.</math>

For a {{mvar|t}}&nbsp;distribution with <math>\ \nu\ </math> degrees of freedom, the [[expected value]] is <math>\ 0\ </math> if <math>\ \nu > 1\ ,</math> and its [[variance]] is <math>\ \frac{ \nu }{\ \nu-2\ }\ </math> if <math>\ \nu > 2 ~.</math> The [[skewness]] is 0 if <math>\ \nu > 3\ </math> and the [[excess kurtosis]] is <math>\ \frac{ 6 }{\ \nu - 4\ }\ </math> if <math>\ \nu > 4 ~.</math>
===How the {{mvar|t}}&nbsp;distribution arises (characterization) {{anchor|Characterization}}===

====As the distribution of a test statistic====
Student's ''t''-distribution with <math>\nu</math> degrees of freedom can be defined as the distribution of the [[random variable]] ''T'' with<ref name="JKB">{{Cite book|title=Continuous Univariate Distributions|vauthors=Johnson NL, Kotz S, Balakrishnan N|publisher=Wiley|year=1995|isbn=9780471584940|edition=2nd|volume=2|chapter=Chapter 28}}</ref><ref name="Hogg">{{cite book|title=Introduction to Mathematical Statistics|vauthors=Hogg RV, Craig AT|publisher=Macmillan|year=1978|edition=4th|location=New York|asin=B010WFO0SA|postscript=. Sections 4.4 and 4.8|author-link=Robert V. Hogg}}</ref>

:<math> T=\frac{Z}{\sqrt{V/\nu}} = Z \sqrt{\frac{\nu}{V}},</math>

where
* ''Z'' is a standard normal with [[expected value]] 0 and variance 1;
* ''V'' has a [[chi-squared distribution]] ({{nowrap|1=<span style="font-family:serif">''χ''</span><sup>2</sup>-distribution}}) with <math>\nu</math> [[Degrees of freedom (statistics)|degrees of freedom]];
* ''Z'' and ''V'' are [[statistical independence|independent]];

A different distribution is defined as that of the random variable defined, for a given constant&nbsp;''μ'', by
:<math>(Z+\mu)\sqrt{\frac{\nu}{V}}.</math>
This random variable has a [[noncentral t-distribution|noncentral ''t''-distribution]] with [[noncentrality parameter]] ''μ''. This distribution is important in studies of the [[statistical power|power]] of Student's ''t''-test.

=====Derivation=====
Suppose ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> are [[statistical independence|independent]] realizations of the normally-distributed, random variable ''X'', which has an expected value ''μ'' and [[variance]] ''σ''<sup>2</sup>. Let

:<math>\overline{X}_n = \frac{1}{n}(X_1+\cdots+X_n)</math>

be the sample mean, and

:<math>s^2 = \frac{1}{n-1} \sum_{i=1}^n \left(X_i - \overline{X}_n\right)^2</math>

be an unbiased estimate of the variance from the sample.  It can be shown that the random variable

: <math>V = (n-1)\frac{s^2}{\sigma^2} </math>

has a chi-squared distribution with <math>\nu = n - 1</math> degrees of freedom (by [[Cochran's theorem]]).<ref>{{cite journal|authorlink1=William Gemmell Cochran | last1=Cochran |first1=W. G.|date=1934|title=The distribution of quadratic forms in a normal system, with applications to the analysis of covariance|journal=[[Mathematical Proceedings of the Cambridge Philosophical Society]]|volume=30|issue=2|pages=178–191|bibcode=1934PCPS...30..178C|doi=10.1017/S0305004100016595|s2cid=122547084 }}</ref>  It is readily shown that the quantity

:<math>Z = \left(\overline{X}_n - \mu\right) \frac{\sqrt{n}}{\sigma}</math>

is normally distributed with mean 0 and variance 1, since the sample mean <math>\overline{X}_n</math> is normally distributed with mean ''μ'' and variance ''σ''<sup>2</sup>/''n''.  Moreover, it is possible to show that these two random variables (the normally distributed one ''Z'' and the chi-squared-distributed one ''V'') are independent. Consequently{{clarify|date=November 2012}} the [[pivotal quantity]]

:<math display="inline">T \equiv \frac{Z}{\sqrt{V/\nu}} = \left(\overline{X}_n - \mu\right) \frac{\sqrt{n}}{s},</math>

which differs from ''Z'' in that the exact standard deviation ''σ'' is replaced by the sample standard error ''s'', has a Student's ''t''-distribution as defined above. Notice that the unknown population variance ''σ''<sup>2</sup> does not appear in ''T'', since it was in both the numerator and the denominator, so it canceled. Gosset intuitively obtained the probability density function stated above, with <math>\nu</math> equal to ''n''&nbsp;−&nbsp;1, and Fisher proved it in 1925.<ref name="Fisher 1925 90–104"/>

The distribution of the test statistic ''T'' depends on <math>\nu</math>, but not ''μ'' or ''σ''; the lack of dependence on ''μ'' and ''σ'' is what makes the ''t''-distribution important in both theory and practice.

====Sampling distribution of t-statistic====
The {{mvar|t}}&nbsp;distribution arises as the sampling distribution
of the {{mvar|t}}&nbsp;statistic. Below the one-sample {{mvar|t}}&nbsp;statistic is discussed, for the corresponding two-sample {{mvar|t}}&nbsp;statistic see [[Student's t-test]].

=====Unbiased variance estimate=====
Let <math>\ x_1, \ldots, x_n \sim {\mathcal N}(\mu, \sigma^2)\ </math> be independent and identically distributed samples from a normal distribution with mean <math>\mu</math> and variance <math>\ \sigma^2 ~.</math> The sample mean and unbiased [[sample variance]] are given by:

: <math>
\begin{align}
 \bar{x} &= \frac{\ x_1+\cdots+x_n\ }{ n }\ , \\[5pt]
 s^2     &= \frac{ 1 }{\ n-1\ }\ \sum_{i=1}^n (x_i - \bar{x})^2 ~.
\end{align}
</math>

The resulting (one sample) {{mvar|t}}&nbsp;statistic is given by

: <math> t = \frac{\bar{x} - \mu}{\ s / \sqrt{n \ }\ } \sim t_{n - 1} ~.</math>

and is distributed according to a Student's {{mvar|t}}&nbsp;distribution with <math>\ n - 1\ </math> degrees of freedom.

Thus for inference purposes the {{mvar|t}}&nbsp;statistic is a useful "[[pivotal quantity]]" in the case when the mean and variance <math>(\mu, \sigma^2)</math> are unknown population parameters, in the sense that the {{mvar|t}}&nbsp;statistic has then a probability distribution that depends on neither <math>\mu</math> nor <math>\ \sigma^2 ~.</math>

=====ML variance estimate=====
Instead of the unbiased estimate <math>\ s^2\ </math> we may also use the maximum likelihood estimate
:<math>\ s^2_\mathsf{ML} = \frac{\ 1\ }{ n }\ \sum_{i=1}^n (x_i - \bar{x})^2\ </math>
yielding the statistic
: <math>\ t_\mathsf{ML} = \frac{\bar{x} - \mu}{\sqrt{s^2_\mathsf{ML}/n\ }} = \sqrt{\frac{n}{n-1}\ }\ t ~.</math>
This is distributed according to the location-scale {{mvar|t}}&nbsp;distribution:
: <math> t_\mathsf{ML} \sim \operatorname{\ell st}(0,\ \tau^2=n/(n-1),\ n-1) ~.</math>

====Compound distribution of normal with inverse gamma distribution====
The location-scale {{mvar|t}}&nbsp;distribution results from [[compound distribution|compounding]] a [[Normal distribution|Gaussian distribution]] (normal distribution) with [[mean]] <math>\ \mu\ </math> and unknown [[variance]], with an [[inverse gamma distribution]] placed over the variance with parameters <math>\ a = \frac{\ \nu\ }{ 2 }\ </math> and <math>b = \frac{\ \nu\ \tau^2\ }{ 2 } ~.</math> In other words, the [[random variable]] ''X'' is assumed to have a Gaussian distribution with an unknown variance distributed as inverse gamma, and then the variance is [[marginalized out]] (integrated out).

Equivalently, this distribution results from compounding a Gaussian distribution with a [[scaled-inverse-chi-squared distribution]] with parameters <math>\nu</math> and <math>\ \tau^2 ~.</math> The scaled-inverse-chi-squared distribution is exactly the same distribution as the inverse gamma distribution, but with a different parameterization, i.e. <math>\ \nu = 2\ a, \; {\tau}^2 = \frac{\ b\ }{ a } ~.</math>

The reason for the usefulness of this characterization is that in [[Bayesian statistics]] the inverse gamma distribution is the [[conjugate prior]] distribution of the variance of a Gaussian distribution. As a result, the location-scale {{mvar|t}}&nbsp;distribution arises naturally in many Bayesian inference problems.<ref>{{Cite book |title=Bayesian Data Analysis |vauthors=Gelman AB, Carlin JS, Rubin DB, Stern HS |publisher=Chapman & Hal l|year=1997 |isbn=9780412039911 |edition=2nd |location=Boca Raton, FL |pages=68 }}</ref>

====Maximum entropy distribution====
Student's {{mvar|t}}&nbsp;distribution is the [[maximum entropy probability distribution]] for a random variate ''X'' having a certain value of <math>\ \operatorname{\mathbb E}\left\{\ \ln(\nu+X^2)\ \right\}\ </math>.<ref>{{cite journal|vauthors=Park SY, Bera AK|date=2009|title=Maximum entropy autoregressive conditional heteroskedasticity model|journal=[[Journal of Econometrics]]|volume=150|issue=2|pages=219–230|doi=10.1016/j.jeconom.2008.12.014}}</ref>
{{Clarify|reason=It is not clear what is meant by "fixed" in this context. An older and more to-the-point source ( https://link.springer.com/content/pdf/10.1007/BF02481032.pdf ) demonstrates that the Student's t distribution with  {{mvar|ν}} d.o.f. is the maximum entropy solution to a specific problem, for which, in addition to one more constraint, ℰ{ ln( 1 + X²/ν)} equals some constant which is predetermined for every {{mvar|ν}}.|date=December 2020}}{{Better source needed|date=December 2020|reason=The source does not obviously state this, although it touches upon something related.}}
This follows immediately from the observation that the pdf can be written in [[exponential family]] form with <math>\nu+X^2</math> as sufficient statistic.

===Integral of Student's probability density function and {{mvar|p}}-value===
The function {{nobr|{{math|''A''(''t'' {{!}} ''ν'')}} }} is the integral of Student's probability density function, {{math|''f''(''t'')}} between &nbsp;{{mvar|-t}} and {{mvar|t}}, for {{nobr|{{math| ''t'' ≥ 0 }} .}} It thus gives the probability that a value of ''t'' less than that calculated from observed data would occur by chance. Therefore, the function {{nobr|{{math|''A''(''t'' {{!}} ''ν'')}} }} can be used when testing whether the difference between the means of two sets of data is statistically significant, by calculating the corresponding value of {{mvar|t}} and the probability of its occurrence if the two sets of data were drawn from the same population. This is used in a variety of situations, particularly in [[t test|{{mvar|t}}&nbsp;tests]]. For the statistic {{mvar|t}}, with {{mvar|ν}} degrees of freedom, {{nobr|{{math|''A''(''t'' {{!}} ''ν'')}} }} is the probability that {{mvar|t}} would be less than the observed value if the two means were the same (provided that the smaller mean is subtracted from the larger, so that {{nobr|{{math| ''t'' ≥ 0}} ).}} It can be easily calculated from the [[cumulative distribution function]] {{math|''F''{{sub|''ν''}}(''t'')}} of the {{mvar|t}}&nbsp;distribution:

:<math> A( t \mid \nu) = F_\nu(t) - F_\nu(-t) = 1 - I_{ \frac{\nu}{\nu +t^2} }\!\left(\frac{\nu}{2},\frac{1}{2}\right),</math>

where {{nobr| {{math| ''I{{sub|x}}''(''a'', ''b'') }}  }} is the regularized [[Beta function#Incomplete beta function|incomplete beta function]].

For statistical hypothesis testing this function is used to construct the [[p-value|''p''-value]].