Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chi-squared distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Properties == === Cochran's theorem === {{Main|Cochran's theorem}} The following is a special case of Cochran's theorem. '''Theorem.''' If <math>Z_1,...,Z_n</math> are [[independence (probability theory)|independent]] identically distributed (i.i.d.), [[standard normal]] random variables, then <math>\sum_{t=1}^n(Z_t - \bar Z)^2 \sim \chi^2_{n-1}</math> where <math>\bar Z = \frac{1}{n} \sum_{t=1}^n Z_t.</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} '''Proof.''' Let <math>Z\sim\mathcal{N}(\bar 0,1\!\!1)</math> be a vector of <math>n</math> independent normally distributed random variables, and <math>\bar Z</math> their average. Then <math> \sum_{t=1}^n(Z_t-\bar Z)^2 ~=~ \sum_{t=1}^n Z_t^2 -n\bar Z^2 ~=~ Z^\top[1\!\!1 -{\textstyle\frac1n}\bar 1\bar 1^\top]Z ~=:~ Z^\top\!M Z </math> where <math>1\!\!1</math> is the identity matrix and <math>\bar 1</math> the all ones vector. <math>M</math> has one eigenvector <math>b_1:={\textstyle\frac{1}{\sqrt{n}}} \bar 1</math> with eigenvalue <math>0</math>, and <math>n-1</math> eigenvectors <math>b_2,...,b_n</math> (all orthogonal to <math>b_1</math>) with eigenvalue <math>1</math>, which can be chosen so that <math>Q:=(b_1,...,b_n)</math> is an orthogonal matrix. Since also <math>X:=Q^\top\!Z\sim\mathcal{N}(\bar 0,Q^\top\!1\!\!1 Q) =\mathcal{N}(\bar 0,1\!\!1)</math>, we have <math> \sum_{t=1}^n(Z_t-\bar Z)^2 ~=~ Z^\top\!M Z ~=~ X^\top\!Q^\top\!M Q X ~=~ X_2^2+...+X_n^2 ~\sim~ \chi^2_{n-1}, </math> which proves the claim. {{hidden end}} === Additivity === It follows from the definition of the chi-squared distribution that the sum of independent chi-squared variables is also chi-squared distributed. Specifically, if <math>X_i,i=\overline{1,n}</math> are independent chi-squared variables with <math>k_i</math>, <math>i=\overline{1,n} </math> degrees of freedom, respectively, then <math>Y = X_1 + \cdots + X_n</math> is chi-squared distributed with <math>k_1 + \cdots + k_n</math> degrees of freedom. === Sample mean === The sample mean of <math>n</math> [[i.i.d.]] chi-squared variables of degree <math>k</math> is distributed according to a gamma distribution with shape <math>\alpha</math> and scale <math>\theta</math> parameters: :<math> \overline X = \frac{1}{n} \sum_{i=1}^n X_i \sim \operatorname{Gamma}\left(\alpha=n\, k /2, \theta= 2/n \right) \qquad \text{where } X_i \sim \chi^2(k)</math> [[#Asymptotic properties|Asymptotically]], given that for a shape parameter <math> \alpha </math> going to infinity, a Gamma distribution converges towards a normal distribution with expectation <math> \mu = \alpha\cdot \theta </math> and variance <math> \sigma^2 = \alpha\, \theta^2 </math>, the sample mean converges towards: <math style="block"> \overline X \xrightarrow{n \to \infty} N(\mu = k, \sigma^2 = 2\, k /n ) </math> Note that we would have obtained the same result invoking instead the [[central limit theorem]], noting that for each chi-squared variable of degree <math>k</math> the expectation is <math> k </math> , and its variance <math> 2\,k </math> (and hence the variance of the sample mean <math> \overline{X}</math> being <math> \sigma^2 = \frac{2k}{n} </math>). === Entropy === The [[differential entropy]] is given by : <math> h = \int_{0}^\infty f(x;\,k)\ln f(x;\,k) \, dx = \frac k 2 + \ln \left[2\,\Gamma \left(\frac k 2 \right)\right] + \left(1-\frac k 2 \right)\, \psi\!\left(\frac k 2 \right), </math> where <math>\psi(x)</math> is the [[Digamma function]]. The chi-squared distribution is the [[maximum entropy probability distribution]] for a random variate <math>X</math> for which <math>\operatorname{E}(X)=k</math> and <math>\operatorname{E}(\ln(X))=\psi(k/2)+\ln(2)</math> are fixed. Since the chi-squared is in the family of gamma distributions, this can be derived by substituting appropriate values in the [[gamma distribution#Logarithmic expectation and variance|Expectation of the log moment of gamma]]. For derivation from more basic principles, see the derivation in [[exponential family#Moment-generating function of the sufficient statistic|moment-generating function of the sufficient statistic]]. === Noncentral moments === The noncentral moments (raw moments) of a chi-squared distribution with <math>k</math> degrees of freedom are given by<ref>[http://mathworld.wolfram.com/Chi-SquaredDistribution.html Chi-squared distribution], from [[MathWorld]], retrieved Feb. 11, 2009</ref><ref>M. K. Simon, ''Probability Distributions Involving Gaussian Random Variables'', New York: Springer, 2002, eq. (2.35), {{ISBN|978-0-387-34657-1}}</ref> : <math> \operatorname{E}(X^m) = k (k+2) (k+4) \cdots (k+2m-2) = 2^m \frac{\Gamma\left(m+\frac{k}{2}\right)}{\Gamma\left(\frac{k}{2}\right)}. </math> === Cumulants === The [[cumulant]]s are readily obtained by a [[power series]] expansion of the logarithm of the characteristic function: : <math>\kappa_n = 2^{n-1}(n-1)!\,k</math> with [[cumulant generating function]] <math>\ln E[e^{tX}] = - \frac k2 \ln(1-2t) </math>. === Concentration === The chi-squared distribution exhibits strong concentration around its mean. The standard Laurent-Massart<ref>{{Cite journal |last1=Laurent |first1=B. |last2=Massart |first2=P. |date=2000-10-01 |title=Adaptive estimation of a quadratic functional by model selection |journal=The Annals of Statistics |volume=28 |issue=5 |doi=10.1214/aos/1015957395 |s2cid=116945590 |issn=0090-5364|doi-access=free }}</ref> bounds are: : <math>\operatorname{P}(X - k \ge 2 \sqrt{k x} + 2x) \le \exp(-x)</math> : <math>\operatorname{P}(k - X \ge 2 \sqrt{k x}) \le \exp(-x)</math> One consequence is that, if <math>Z \sim N(0, 1)^k</math> is a gaussian random vector in <math>\R^k</math>, then as the dimension <math>k</math> grows, the squared length of the vector is concentrated tightly around <math>k</math> with a width <math>k^{1/2 + \alpha}</math>:<math display="block">Pr(\|Z\|^2 \in [k - 2k^{1/2+\alpha}, k + 2k^{1/2+\alpha} + 2k^{\alpha}]) \geq 1-e^{-k^\alpha}</math>where the exponent <math>\alpha</math> can be chosen as any value in <math>\R</math>. Since the cumulant generating function for <math>\chi^2(k)</math> is <math>K(t) = -\frac k2 \ln(1-2t) </math>, and its [[Convex conjugate|convex dual]] is <math>K^*(q) = \frac 12 (q-k + k\ln\frac kq) </math>, the standard [[Chernoff bound]] yields<math display="block">\begin{aligned} \ln Pr(X \geq (1 + \epsilon) k) &\leq -\frac k2 ( \epsilon - \ln(1+\epsilon)) \\ \ln Pr(X \leq (1 - \epsilon) k) &\leq -\frac k2 ( -\epsilon - \ln(1-\epsilon)) \end{aligned}</math>where <math>0< \epsilon < 1</math>. By the union bound,<math display="block">Pr(X \in (1\pm \epsilon ) k ) \geq 1 - 2e^{-\frac k2 (\frac 12 \epsilon^2 - \frac 13 \epsilon^3)} </math>This result is used in proving the [[Johnson–Lindenstrauss lemma]].<ref>[https://ocw.mit.edu/courses/18-s096-topics-in-mathematics-of-data-science-fall-2015/f9261308512f6b90e284599f94055bb4_MIT18_S096F15_Ses15_16.pdf MIT 18.S096 (Fall 2015): Topics in Mathematics of Data Science, Lecture 5, Johnson-Lindenstrauss Lemma and Gordons Theorem]</ref> === Asymptotic properties === [[File:Chi-square median approx.png|thumb|upright=1.818|Approximate formula for median (from the Wilson–Hilferty transformation) compared with numerical quantile (top); and difference ({{font color|blue|blue}}) and relative difference ({{font color|red|red}}) between numerical quantile and approximate formula (bottom). For the chi-squared distribution, only the positive integer numbers of degrees of freedom (circles) are meaningful.]] By the [[central limit theorem]], because the chi-squared distribution is the sum of <math>k</math> independent random variables with finite mean and variance, it converges to a normal distribution for large <math>k</math>. For many practical purposes, for <math>k>50</math> the distribution is sufficiently close to a [[normal distribution]], so the difference is ignorable.<ref>{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|isbn=978-0-471-09315-2|page=[https://archive.org/details/statisticsforexp00geor/page/118 118]|url-access=registration|url=https://archive.org/details/statisticsforexp00geor/page/118}}</ref> Specifically, if <math>X \sim \chi^2(k)</math>, then as <math>k</math> tends to infinity, the distribution of <math>(X-k)/\sqrt{2k}</math> [[convergence of random variables#Convergence in distribution|tends]] to a standard normal distribution. However, convergence is slow as the [[skewness]] is <math>\sqrt{8/k}</math> and the [[excess kurtosis]] is <math>12/k</math>. The sampling distribution of <math>\ln(\chi^2)</math> converges to normality much faster than the sampling distribution of <math>\chi^2</math>,<ref>{{cite journal |first1=M. S. |last1=Bartlett |first2=D. G. |last2=Kendall |title=The Statistical Analysis of Variance-Heterogeneity and the Logarithmic Transformation |journal=Supplement to the Journal of the Royal Statistical Society |volume=8 |issue=1 |year=1946 |pages=128–138 |jstor=2983618 |doi=10.2307/2983618 }}</ref> as the [[logarithmic transformation|logarithmic transform]] removes much of the asymmetry.<ref name="Pillai-2016">{{Cite journal|last=Pillai|first=Natesh S.|year=2016|title=An unexpected encounter with Cauchy and Lévy|journal=[[Annals of Statistics]]|volume=44|issue=5|pages=2089–2097|doi=10.1214/15-aos1407|arxiv=1505.01957|s2cid=31582370}}</ref> Other functions of the chi-squared distribution converge more rapidly to a normal distribution. Some examples are: * If <math>X \sim \chi^2(k)</math> then <math>\sqrt{2X}</math> is approximately normally distributed with mean <math>\sqrt{2k-1}</math> and unit variance (1922, by [[R. A. Fisher]], see (18.23), p. 426 of Johnson.<ref name="Johnson-1994" /> * If <math>X \sim \chi^2(k)</math> then <math>\sqrt[3]{X/k}</math> is approximately normally distributed with mean <math> 1-\frac{2}{9k}</math> and variance <math>\frac{2}{9k} .</math><ref>{{cite journal |last1=Wilson |first1=E. B. |last2=Hilferty |first2=M. M. |year=1931 |title=The distribution of chi-squared |journal=[[Proc. Natl. Acad. Sci. USA]] |volume=17 |issue=12 |pages=684–688 |bibcode=1931PNAS...17..684W |doi=10.1073/pnas.17.12.684 |pmid=16577411 |pmc=1076144 |doi-access=free }}</ref> This is known as the '''Wilson–Hilferty transformation''', see (18.24), p. 426 of Johnson.<ref name="Johnson-1994" /> ** This [[Data transformation (statistics)#Transforming to normality|normalizing transformation]] leads directly to the commonly used median approximation <math>k\bigg(1-\frac{2}{9k}\bigg)^3\;</math> by back-transforming from the mean, which is also the median, of the normal distribution.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)