Editing Chi-squared distribution (section)

== Properties ==

=== Cochran's theorem ===
{{Main|Cochran's theorem}}
The following is a special case of Cochran's theorem.

'''Theorem.''' If <math>Z_1,...,Z_n</math> are [[independence (probability theory)|independent]] identically distributed (i.i.d.), [[standard normal]] random variables, then
<math>\sum_{t=1}^n(Z_t - \bar Z)^2 \sim \chi^2_{n-1}</math>
where <math>\bar Z = \frac{1}{n} \sum_{t=1}^n Z_t.</math>

{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}}
'''Proof.''' Let <math>Z\sim\mathcal{N}(\bar 0,1\!\!1)</math> be a vector of <math>n</math> independent normally distributed random variables,
and <math>\bar Z</math> their average.
Then
<math>
  \sum_{t=1}^n(Z_t-\bar Z)^2 ~=~ \sum_{t=1}^n Z_t^2 -n\bar Z^2 ~=~ Z^\top[1\!\!1 -{\textstyle\frac1n}\bar 1\bar 1^\top]Z ~=:~ Z^\top\!M Z
</math>
where <math>1\!\!1</math> is the identity matrix and <math>\bar 1</math> the all ones vector.
<math>M</math> has one eigenvector <math>b_1:={\textstyle\frac{1}{\sqrt{n}}} \bar 1</math> with eigenvalue <math>0</math>,
and <math>n-1</math> eigenvectors <math>b_2,...,b_n</math> (all orthogonal to <math>b_1</math>) with eigenvalue <math>1</math>,
which can be chosen so that <math>Q:=(b_1,...,b_n)</math> is an orthogonal matrix.
Since also <math>X:=Q^\top\!Z\sim\mathcal{N}(\bar 0,Q^\top\!1\!\!1 Q) =\mathcal{N}(\bar 0,1\!\!1)</math>,
we have
<math>
  \sum_{t=1}^n(Z_t-\bar Z)^2 ~=~ Z^\top\!M Z ~=~ X^\top\!Q^\top\!M Q X ~=~ X_2^2+...+X_n^2 ~\sim~ \chi^2_{n-1},
</math>
which proves the claim.
{{hidden end}}

=== Additivity ===
It follows from the definition of the chi-squared distribution that the sum of independent chi-squared variables is also chi-squared distributed. Specifically, if <math>X_i,i=\overline{1,n}</math> are independent chi-squared variables with <math>k_i</math>, <math>i=\overline{1,n} </math> degrees of freedom, respectively, then <math>Y = X_1 + \cdots + X_n</math> is chi-squared distributed with <math>k_1 + \cdots + k_n</math> degrees of freedom.

=== Sample mean ===
The sample mean of <math>n</math> [[i.i.d.]] chi-squared variables of degree <math>k</math> is distributed according to a gamma distribution with shape <math>\alpha</math> and scale <math>\theta</math> parameters:
:<math> \overline X = \frac{1}{n} \sum_{i=1}^n X_i \sim \operatorname{Gamma}\left(\alpha=n\, k /2, \theta= 2/n \right) \qquad \text{where } X_i \sim \chi^2(k)</math>

[[#Asymptotic properties|Asymptotically]], given that for a shape parameter <math> \alpha </math> going to infinity, a Gamma distribution converges towards a normal distribution with expectation <math> \mu = \alpha\cdot \theta </math> and variance <math> \sigma^2 = \alpha\, \theta^2 </math>, the sample mean converges towards:

<math style="block"> \overline X \xrightarrow{n \to \infty} N(\mu = k, \sigma^2 = 2\, k /n ) </math>

Note that we would have obtained the same result invoking instead the [[central limit theorem]], noting that for each chi-squared variable of degree <math>k</math> the expectation is <math> k </math> , and its variance <math> 2\,k </math> (and hence the variance of the sample mean <math> \overline{X}</math> being <math> \sigma^2 = \frac{2k}{n} </math>).

=== Entropy ===
The [[differential entropy]] is given by
: <math>
    h = \int_{0}^\infty f(x;\,k)\ln f(x;\,k) \, dx
      = \frac k 2 + \ln \left[2\,\Gamma \left(\frac k 2 \right)\right] + \left(1-\frac k 2 \right)\, \psi\!\left(\frac k 2 \right),
  </math>
where <math>\psi(x)</math> is the [[Digamma function]].

The chi-squared distribution is the [[maximum entropy probability distribution]] for a random variate <math>X</math> for which <math>\operatorname{E}(X)=k</math> and <math>\operatorname{E}(\ln(X))=\psi(k/2)+\ln(2)</math> are fixed. Since the chi-squared is in the family of gamma distributions, this can be derived by substituting appropriate values in the [[gamma distribution#Logarithmic expectation and variance|Expectation of the log moment of gamma]]. For derivation from more basic principles, see the derivation in [[exponential family#Moment-generating function of the sufficient statistic|moment-generating function of the sufficient statistic]].

=== Noncentral moments ===
The noncentral moments (raw moments) of a chi-squared distribution with <math>k</math> degrees of freedom are given by<ref>[http://mathworld.wolfram.com/Chi-SquaredDistribution.html Chi-squared distribution], from [[MathWorld]], retrieved Feb. 11, 2009</ref><ref>M. K. Simon, ''Probability Distributions Involving Gaussian Random Variables'', New York: Springer, 2002, eq. (2.35), {{ISBN|978-0-387-34657-1}}</ref>
: <math>
    \operatorname{E}(X^m) = k (k+2) (k+4) \cdots (k+2m-2) = 2^m \frac{\Gamma\left(m+\frac{k}{2}\right)}{\Gamma\left(\frac{k}{2}\right)}.
  </math>

=== Cumulants ===
The [[cumulant]]s are readily obtained by a [[power series]] expansion of the logarithm of the characteristic function:
: <math>\kappa_n = 2^{n-1}(n-1)!\,k</math>
with [[cumulant generating function]] <math>\ln E[e^{tX}] = - \frac k2 \ln(1-2t) </math>.

=== Concentration ===

The chi-squared distribution exhibits strong concentration around its mean. The standard Laurent-Massart<ref>{{Cite journal |last1=Laurent |first1=B. |last2=Massart |first2=P. |date=2000-10-01 |title=Adaptive estimation of a quadratic functional by model selection |journal=The Annals of Statistics |volume=28 |issue=5 |doi=10.1214/aos/1015957395 |s2cid=116945590 |issn=0090-5364|doi-access=free }}</ref> bounds are:
: <math>\operatorname{P}(X - k \ge 2 \sqrt{k x} + 2x) \le \exp(-x)</math>
: <math>\operatorname{P}(k - X \ge 2 \sqrt{k x}) \le \exp(-x)</math>
One consequence is that, if <math>Z \sim N(0, 1)^k</math> is a gaussian random vector in <math>\R^k</math>, then as the dimension <math>k</math> grows, the squared length of the vector is concentrated tightly around <math>k</math> with a width <math>k^{1/2 + \alpha}</math>:<math display="block">Pr(\|Z\|^2 \in [k - 2k^{1/2+\alpha}, k + 2k^{1/2+\alpha} + 2k^{\alpha}]) \geq 1-e^{-k^\alpha}</math>where the exponent <math>\alpha</math> can be chosen as any value in <math>\R</math>.

Since the cumulant generating function for <math>\chi^2(k)</math> is <math>K(t) = -\frac k2 \ln(1-2t) </math>, and its [[Convex conjugate|convex dual]] is <math>K^*(q) = \frac 12 (q-k + k\ln\frac kq) </math>, the standard [[Chernoff bound]] yields<math display="block">\begin{aligned}
\ln Pr(X \geq (1 + \epsilon) k) &\leq -\frac k2 ( \epsilon - \ln(1+\epsilon)) \\
\ln Pr(X \leq (1 - \epsilon) k) &\leq -\frac k2 ( -\epsilon - \ln(1-\epsilon))
\end{aligned}</math>where <math>0< \epsilon < 1</math>. By the union bound,<math display="block">Pr(X \in (1\pm \epsilon ) k ) \geq 1 - 2e^{-\frac k2 (\frac 12 \epsilon^2 - \frac 13 \epsilon^3)} </math>This result is used in proving the [[Johnson–Lindenstrauss lemma]].<ref>[https://ocw.mit.edu/courses/18-s096-topics-in-mathematics-of-data-science-fall-2015/f9261308512f6b90e284599f94055bb4_MIT18_S096F15_Ses15_16.pdf MIT 18.S096 (Fall 2015): Topics in Mathematics of Data Science, Lecture 5, Johnson-Lindenstrauss Lemma and Gordons Theorem]</ref>

=== Asymptotic properties ===
[[File:Chi-square median approx.png|thumb|upright=1.818|Approximate formula for median (from the Wilson–Hilferty transformation) compared with numerical quantile (top); and difference ({{font color|blue|blue}}) and relative difference ({{font color|red|red}}) between numerical quantile and approximate formula (bottom). For the chi-squared distribution, only the positive integer numbers of degrees of freedom (circles) are meaningful.]]

By the [[central limit theorem]], because the chi-squared distribution is the sum of <math>k</math> independent random variables with finite mean and variance, it converges to a normal distribution for large <math>k</math>. For many practical purposes, for <math>k>50</math> the distribution is sufficiently close to a [[normal distribution]], so the difference is ignorable.<ref>{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|isbn=978-0-471-09315-2|page=[https://archive.org/details/statisticsforexp00geor/page/118 118]|url-access=registration|url=https://archive.org/details/statisticsforexp00geor/page/118}}</ref> Specifically, if <math>X \sim \chi^2(k)</math>, then as <math>k</math> tends to infinity, the distribution of <math>(X-k)/\sqrt{2k}</math> [[convergence of random variables#Convergence in distribution|tends]] to a standard normal distribution. However, convergence is slow as the [[skewness]] is <math>\sqrt{8/k}</math> and the [[excess kurtosis]] is <math>12/k</math>.

The sampling distribution of <math>\ln(\chi^2)</math> converges to normality much faster than the sampling distribution of <math>\chi^2</math>,<ref>{{cite journal |first1=M. S. |last1=Bartlett |first2=D. G. |last2=Kendall |title=The Statistical Analysis of Variance-Heterogeneity and the Logarithmic Transformation |journal=Supplement to the Journal of the Royal Statistical Society |volume=8 |issue=1 |year=1946 |pages=128–138 |jstor=2983618 |doi=10.2307/2983618 }}</ref> as the [[logarithmic transformation|logarithmic transform]] removes much of the asymmetry.<ref name="Pillai-2016">{{Cite journal|last=Pillai|first=Natesh S.|year=2016|title=An unexpected encounter with Cauchy and Lévy|journal=[[Annals of Statistics]]|volume=44|issue=5|pages=2089–2097|doi=10.1214/15-aos1407|arxiv=1505.01957|s2cid=31582370}}</ref>

Other functions of the chi-squared distribution converge more rapidly to a normal distribution. Some examples are:
* If <math>X \sim \chi^2(k)</math> then <math>\sqrt{2X}</math> is approximately normally distributed with mean <math>\sqrt{2k-1}</math> and unit variance (1922, by [[R. A. Fisher]], see (18.23), p.&nbsp;426 of Johnson.<ref name="Johnson-1994" />
* If <math>X \sim \chi^2(k)</math> then <math>\sqrt[3]{X/k}</math> is approximately normally distributed with mean <math> 1-\frac{2}{9k}</math> and variance <math>\frac{2}{9k} .</math><ref>{{cite journal |last1=Wilson |first1=E. B. |last2=Hilferty |first2=M. M. |year=1931 |title=The distribution of chi-squared |journal=[[Proc. Natl. Acad. Sci. USA]] |volume=17 |issue=12 |pages=684–688 |bibcode=1931PNAS...17..684W |doi=10.1073/pnas.17.12.684 |pmid=16577411 |pmc=1076144 |doi-access=free }}</ref> This is known as the '''Wilson–Hilferty transformation''', see (18.24), p.&nbsp;426 of Johnson.<ref name="Johnson-1994" />
** This [[Data transformation (statistics)#Transforming to normality|normalizing transformation]] leads directly to the commonly used median approximation <math>k\bigg(1-\frac{2}{9k}\bigg)^3\;</math> by back-transforming from the mean, which is also the median, of the normal distribution.