Editing Chi-squared distribution (section)

=== Introduction ===

The chi-squared distribution is used primarily in hypothesis testing, and to a lesser extent for confidence intervals for population variance when the underlying distribution is normal. Unlike more widely known distributions such as the [[normal distribution]] and the [[exponential distribution]], the chi-squared distribution is not as often applied in the direct modeling of natural phenomena. It arises in the following hypothesis tests, among others:
* [[Pearson's chi-squared test|Chi-squared test]] of independence in [[contingency tables]]
* [[Pearson's chi-squared test|Chi-squared test]] of goodness of fit of observed data to hypothetical distributions
* [[Likelihood-ratio test]] for nested models
* [[Log-rank test]] in survival analysis
* [[Cochran–Mantel–Haenszel test]] for stratified contingency tables
* [[Wald test]]
* [[Score test]]

It is also a component of the definition of the [[Student's t-distribution|''t''-distribution]] and the [[F-distribution|''F''-distribution]] used in ''t''-tests, analysis of variance, and regression analysis.

The primary reason for which the chi-squared distribution is extensively used in hypothesis testing is its relationship to the normal distribution. Many hypothesis tests use a test statistic, such as the [[t-statistic|''t''-statistic]] in a ''t''-test. For these hypothesis tests, as the sample size, {{mvar|n}}, increases, the [[sampling distribution]] of the test statistic approaches the normal distribution ([[central limit theorem]]). Because the test statistic (such as {{mvar|t}}) is asymptotically normally distributed, provided the sample size is sufficiently large, the distribution used for hypothesis testing may be approximated by a normal distribution. Testing hypotheses using a normal distribution is well understood and relatively easy. The simplest chi-squared distribution is the square of a standard normal distribution. So wherever a normal distribution could be used for a hypothesis test, a chi-squared distribution could be used.

Suppose that <math>Z</math> is a random variable sampled from the standard normal distribution, where the mean is <math>0</math> and the variance is <math>1</math>: <math>Z \sim N(0,1)</math>. Now, consider the random variable <math>X = Z^2</math>. The distribution of the random variable <math>X</math> is an example of a chi-squared distribution: <math>\ X\ \sim\ \chi^2_1</math>. The subscript 1 indicates that this particular chi-squared distribution is constructed from only 1 standard normal distribution. A chi-squared distribution constructed by squaring a single standard normal distribution is said to have 1 degree of freedom. Thus, as the sample size for a hypothesis test increases, the distribution of the test statistic approaches a normal distribution. Just as extreme values of the normal distribution have low probability (and give small p-values), extreme values of the chi-squared distribution have low probability.

An additional reason that the chi-squared distribution is widely used is that it turns up as the large sample distribution of generalized [[Likelihood-ratio test|likelihood ratio tests]] (LRT).<ref name="Westfall-2013">{{cite book|last1=Westfall|first1=Peter H.|title=Understanding Advanced Statistical Methods|date=2013|publisher=CRC Press|location=Boca Raton, FL|isbn=978-1-4665-1210-8}}</ref> LRTs have several desirable properties; in particular, simple LRTs commonly provide the highest power to reject the null hypothesis ([[Neyman–Pearson lemma]]) and this leads also to optimality properties of generalised LRTs. However, the normal and chi-squared approximations are only valid asymptotically. For this reason, it is preferable to use the ''t'' distribution rather than the normal approximation or the chi-squared approximation for a small sample size. Similarly, in analyses of contingency tables, the chi-squared approximation will be poor for a small sample size, and it is preferable to use [[Fisher's exact test]]. Ramsey shows that the exact [[binomial test]] is always more powerful than the normal approximation.<ref name="Ramsey-1988">{{cite journal|last1=Ramsey|first1=PH|title=Evaluating the Normal Approximation to the Binomial Test|journal=Journal of Educational Statistics|date=1988|volume=13|issue=2|pages=173–82|doi=10.2307/1164752|jstor=1164752}}</ref>

Lancaster shows the connections among the binomial, normal, and chi-squared distributions, as follows.<ref name="Lancaster-1969">{{Citation
|last=Lancaster
|first=H.O.
|title=The Chi-squared Distribution
|year=1969
|publisher=Wiley
}}</ref> De Moivre and Laplace established that a binomial distribution could be approximated by a normal distribution. Specifically they showed the asymptotic normality of the random variable

:<math> \chi = {m - Np \over \sqrt{Npq}} </math>

where <math>m</math> is the observed number of successes in <math>N</math> trials, where the probability of success is <math>p</math>, and <math>q = 1 - p</math>.

Squaring both sides of the equation gives

: <math style="block"> \chi^2 = {(m - Np)^2\over Npq} </math>

Using <math>N = Np + N(1 - p)</math>, <math>N = m + (N - m)</math>, and <math>q = 1 - p</math>, this equation can be rewritten as

: <math style="block"> \chi^2 = {(m - Np)^2\over Np} + {(N - m - Nq)^2\over Nq} </math>

The expression on the right is of the form that [[Karl Pearson]] would generalize to the form

: <math style="block"> \chi^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i} </math>

where

<math style="block"> \chi^2</math> = Pearson's cumulative test statistic, which asymptotically approaches a <math>\chi^2</math> distribution;
<math style="block">O_i</math> = the number of observations of type <math>i</math>;
<math style="block">E_i = N p_i</math> = the expected (theoretical) frequency of type <math>i</math>, asserted by the null hypothesis that the fraction of type <math>i</math> in the population is <math> p_i</math>; and
<math style="block">n</math> = the number of cells in the table.{{cn|date=November 2023}}

In the case of a binomial outcome (flipping a coin), the binomial distribution may be approximated by a normal distribution (for sufficiently large <math>n</math>). Because the square of a standard normal distribution is the chi-squared distribution with one degree of freedom, the probability of a result such as 1 heads in 10 trials can be approximated either by using the normal distribution directly, or the chi-squared distribution for the normalised, squared difference between observed and expected value. However, many problems involve more than the two possible outcomes of a binomial, and instead require 3 or more categories, which leads to the multinomial distribution. Just as de Moivre and Laplace sought for and found the normal approximation to the binomial, Pearson sought for and found a degenerate multivariate normal approximation to the multinomial distribution (the numbers in each category add up to the total sample size, which is considered fixed). Pearson showed that the chi-squared distribution arose from such a multivariate normal approximation to the multinomial distribution, taking careful account of the statistical dependence (negative correlations) between numbers of observations in different categories.<ref name="Lancaster-1969" />