Editing Cochran's theorem

{{Short description|Statistical theorem in the analysis of variance}}
In [[statistics]], '''Cochran's theorem''', devised by [[William G. Cochran]],<ref name="Cochran">{{cite journal|last=Cochran|first=W. G.|author-link=William Gemmell Cochran|title=The distribution of quadratic forms in a normal system, with applications to the analysis of covariance|journal=[[Mathematical Proceedings of the Cambridge Philosophical Society]]|date=April 1934|volume=30|issue=2|pages=178–191|doi=10.1017/S0305004100016595|bibcode=1934PCPS...30..178C }}</ref> is a [[theorem]] used to justify results relating to the [[probability distribution]]s of statistics that are used in the [[analysis of variance]].<ref>{{cite book |author= Bapat, R. B.|title=Linear Algebra and Linear Models|edition=Second|publisher= Springer |year=2000|isbn=978-0-387-98871-9}}</ref>

== Examples ==

=== Sample mean and sample variance ===
If ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> are independent normally distributed random variables with mean ''μ'' and standard deviation ''σ'' then

:<math>U_i = \frac{X_i-\mu}{\sigma}</math>

is [[standard normal]] for each ''i''. Note that the total ''Q'' is equal to sum of squared ''U''s as shown here:

:<math>\sum_iQ_i=\sum_{jik} U_j B_{jk}^{(i)} U_k = \sum_{jk} U_j U_k \sum_i B_{jk}^{(i)} =
\sum_{jk} U_j U_k\delta_{jk} = \sum_{j} U_j^2</math>
which stems from the original assumption that <math>B_{1} + B_{2} \ldots = I</math>.
So instead we will calculate this quantity and later separate it into ''Q''<sub>''i''</sub>'s. It is possible to write

:<math>
\sum_{i=1}^n U_i^2=\sum_{i=1}^n\left(\frac{X_i-\overline{X}}{\sigma}\right)^2
+ n\left(\frac{\overline{X}-\mu}{\sigma}\right)^2
</math>

(here <math>\overline{X}</math> is the [[Arithmetic mean|sample mean]]). To see this identity, multiply throughout by <math>\sigma^2</math> and note that

:<math>
\sum(X_i-\mu)^2=
\sum(X_i-\overline{X}+\overline{X}-\mu)^2
</math>

and expand to give

:<math>
\sum(X_i-\mu)^2=
\sum(X_i-\overline{X})^2+\sum(\overline{X}-\mu)^2+
2\sum(X_i-\overline{X})(\overline{X}-\mu).
</math>

The third term is zero because it is equal to a constant times

:<math>\sum(\overline{X}-X_i)=0,</math>

and the second term has just ''n'' identical terms added together. Thus
:<math>
\sum(X_i-\mu)^2 = \sum(X_i-\overline{X})^2+n(\overline{X}-\mu)^2 ,
</math>

and hence
:<math>
\sum\left(\tfrac{X_i-\mu}{\sigma}\right)^2=
\sum\left(\tfrac{X_i-\overline{X}}{\sigma}\right)^2
+n\left(\tfrac{\overline{X}-\mu}{\sigma}\right)^2=
\overbrace{\sum_i\left(U_i-\tfrac{1}{n}\sum_j{U_j}\right)^2}^{Q_1}
+\overbrace{\tfrac{1}{n}\left(\sum_j{U_j}\right)^2}^{Q_2}=
Q_1+Q_2.
</math>

Now <math>B^{(2)}=\frac{J_n}{n}</math> with <math>J_n</math> the [[matrix of ones]] which has rank 1. In turn     <math>B^{(1)}= I_n-\frac{J_n}{n}</math> given that <math>I_n=B^{(1)}+B^{(2)}</math>. This expression can be also obtained by expanding <math>Q_1</math> in matrix notation. It can be shown that the rank of <math>B^{(1)}</math> is <math>n-1</math> as the addition of all its rows is equal to zero. Thus the conditions for Cochran's theorem are met.

Cochran's theorem then states that ''Q''<sub>1</sub> and ''Q''<sub>2</sub> are independent, with chi-squared distributions with ''n'' &minus; 1 and 1 degree of freedom respectively. This shows that the sample mean and [[sample variance]] are independent.  This can also be shown by [[Basu's theorem]], and in fact this property ''characterizes'' the normal distribution – for no other distribution are the sample mean and sample variance independent.<ref>{{cite journal
 |doi=10.2307/2983669
 |first=R.C. |last=Geary |author-link=Roy C. Geary
 |year=1936
 |title=The Distribution of "Student's" Ratio for Non-Normal Samples
 |journal=Supplement to the Journal of the Royal Statistical Society
 |volume=3 |issue=2 |pages=178–184
 |jfm=63.1090.03
 |jstor=2983669
}}</ref>

===Distributions===

The result for the distributions is written symbolically as
:<math>
\sum\left(X_i-\overline{X}\right)^2  \sim \sigma^2 \chi^2_{n-1}.
</math>
:<math>
n(\overline{X}-\mu)^2\sim \sigma^2 \chi^2_1,
</math>

Both these random variables are proportional to the true but unknown variance ''σ''<sup>2</sup>. Thus their ratio does not depend on ''σ''<sup>2</sup> and, because they are statistically independent. The distribution of their ratio is given by

:<math>
\frac{n\left(\overline{X}-\mu\right)^2}
{\frac{1}{n-1}\sum\left(X_i-\overline{X}\right)^2}\sim \frac{\chi^2_1}{\frac{1}{n-1}\chi^2_{n-1}}
   \sim F_{1,n-1}
</math>

where ''F''<sub>1,''n''&nbsp;&minus;&nbsp;1</sub> is the [[F-distribution]] with 1 and ''n''&nbsp;&minus;&nbsp;1 degrees of freedom (see also [[Student's t-distribution]]). The final step here is effectively the definition of a random variable having the F-distribution.

=== Estimation of variance ===
To estimate the variance ''σ''<sup>2</sup>, one estimator that is sometimes used is the [[maximum likelihood]] estimator of the variance of a normal distribution

:<math>
\widehat{\sigma}^2=
\frac{1}{n}\sum\left(
X_i-\overline{X}\right)^2. </math>

Cochran's theorem shows that

:<math>
\frac{n\widehat{\sigma}^2}{\sigma^2}\sim\chi^2_{n-1}
</math>

and the properties of the chi-squared distribution show that

:<math>\begin{align}
E \left(\frac{n \widehat{\sigma}^2}{\sigma^2}\right) &= E \left(\chi^2_{n-1}\right) \\ 
\frac{n}{\sigma^2}E \left(\widehat{\sigma}^2\right) &= (n-1) \\
E \left(\widehat{\sigma}^2\right) &= \frac{\sigma^2 (n-1)}{n}
\end{align}</math>

==Alternative formulation==
The following version is often seen when considering linear regression.<ref>{{Cite web|url=https://yangfengstat.github.io/assets/teaching/cochran's-theorem.pdf|title=Cochran's Theorem (A quick tutorial)}}</ref> Suppose that <math>Y\sim N_n(0,\sigma^2I_n)</math> is a standard [[Multivariate normal distribution|multivariate normal]] [[random vector]] (here <math>I_n</math> denotes the ''n''-by-''n'' [[identity matrix]]), and if <math>A_1,\ldots,A_k</math> are all ''n''-by-''n'' [[symmetric matrices]] with <math>\sum_{i=1}^kA_i=I_n</math>.  Then, on defining <math>r_i= \operatorname{Rank}(A_i)</math>, any one of the following conditions implies the other two:

* <math>\sum_{i=1}^kr_i=n ,</math>
* <math>Y^TA_iY\sim\sigma^2\chi^2_{r_i}</math>  (thus the <math>A_i</math> are [[Positive-semidefinite matrix|positive semidefinite]])
* <math>Y^TA_iY</math> is independent of <math>Y^TA_jY</math> for <math>i\neq j .</math>


== Statement ==
Let ''U''<sub>1</sub>, ..., ''U''<sub>''N''</sub> be i.i.d. standard [[normal distribution|normally distributed]] [[random variable]]s, and <math>U = [U_1, ..., U_N]^T</math>. Let <math>B^{(1)},B^{(2)},\ldots, B^{(k)}</math>be [[symmetric matrices]]. Define ''r''<sub>''i''</sub> to be the [[rank (linear algebra)|rank]] of <math>B^{(i)}</math>.  Define <math>Q_i=U^T B^{(i)}U</math>, so that the ''Q''<sub>i</sub> are [[quadratic form]]s. Further assume <math>\sum_i Q_i = U^T U</math>. 

'''Cochran's theorem''' states that the following are equivalent: 

* <math>r_1+\cdots +r_k=N</math>,
* the ''Q''<sub>''i''</sub> are [[independence (probability)|independent]]
* each ''Q''<sub>''i''</sub> has a [[chi-squared distribution]] with ''r''<sub>''i''</sub> [[degrees of freedom (statistics)|degrees of freedom]].<ref name="Cochran" /><ref>{{Citation |title=Cochran's theorem |date=2008-01-01 |url=https://www.oxfordreference.com/view/10.1093/acref/9780199541454.001.0001/acref-9780199541454-e-294 |work=A Dictionary of Statistics |publisher=Oxford University Press |language=en |doi=10.1093/acref/9780199541454.001.0001 |isbn=978-0-19-954145-4 |access-date=2022-05-18 |last1=Upton |first1=Graham |last2=Cook |first2=Ian |url-access=subscription }}</ref>

Often it's stated as <math>\sum_i A_i = A</math>, where <math>A</math> is idempotent, and <math>\sum_i r_i = N</math> is replaced by <math>\sum_i r_i = rank(A)</math>. But after an orthogonal transform, <math>A = diag(I_M, 0)</math>, and so we reduce to the above theorem.

=== Proof ===
'''Claim''': Let <math>X</math> be a standard Gaussian in <math>\R^n</math>, then for any symmetric matrices <math>Q, Q'</math>, if <math>X^T Q X</math> and <math>X^T Q' X</math> have the same distribution, then <math>Q, Q'</math> have the same eigenvalues (up to multiplicity).

{{Math proof|title=Proof|proof= 

Let the eigenvalues of <math>Q</math> be <math>\lambda_1, ..., \lambda_n</math>, then calculate the [[Characteristic function (probability theory)|characteristic function]] of <math>X^T Q X</math>. It comes out to be 

<math>\phi(t) =\left(\prod_j (1-2i \lambda_j t)\right)^{-1/2}</math>

(To calculate it, first diagonalize <math>Q</math>, change into that frame, then use the fact that the characteristic function of the sum of independent variables is the product of their characteristic functions.)

For <math>X^T Q X</math> and <math>X^T Q' X</math> to be equal, their characteristic functions must be equal, so <math>Q, Q'</math> have the same eigenvalues (up to multiplicity).
}}

'''Claim''': <math>I = \sum_i B_i</math>.

{{Math proof|title=Proof|proof= 
<math>U^T (I - \sum_i B_i) U = 0</math>. Since <math>(I - \sum_i B_i)</math> is symmetric, and <math>U^T (I - \sum_i B_i) U =^d U^T 0 U</math>, by the previous claim, <math>(I - \sum_i B_i)</math> has the same eigenvalues as 0.
}}

'''Lemma''': If <math>\sum_i M_i = I</math>, all <math>M_i</math> symmetric, and have eigenvalues 0, 1, then they are simultaneously diagonalizable.

{{Math proof|title=Proof|proof= 
Fix i, and consider the eigenvectors v of <math>M_i
</math> such that <math>M_i v = v</math>. Then we have <math>v^T v = v^T I v = v^T v + \sum_{j\neq i} v^T M_j v</math>, so all <math>v^T M_j v = 0</math>. Thus we obtain a split of <math>\R^N</math> into <math>V\oplus V^\perp</math>, such that V is the 1-eigenspace of <math>M_i
</math>, and in the 0-eigenspaces of all other <math>M_j
</math>. Now induct by moving into <math>V^\perp</math>.
}}

Now we prove the original theorem. We prove that the three cases are equivalent by proving that each case implies the next one in a cycle (<math>1 \to 2 \to 3 \to 1</math>).

{{Math proof|title=Proof|proof= 

'''Case''': All <math>Q_i</math> are independent

Fix some <math>i</math>, define <math>C_i = I - B_i = \sum_{j\neq i} B_j</math>, and diagonalize <math>B_i</math> by an orthogonal transform <math>O</math>. Then consider <math>O C_i O^T = I - O B_i O^T</math>. It is diagonalized as well.

Let <math>W = OU</math>, then it is also standard Gaussian. Then we have 

<math>Q_i = W^T (OB_i O^T) W; \quad \sum_{j\neq i} Q_j = W^T (I - OB_i O^T) W</math>

Inspect their diagonal entries, to see that <math>Q_i \perp \sum_{j\neq i} Q_j</math> implies that their nonzero diagonal entries are disjoint.

Thus all eigenvalues of <math>B_i</math> are 0, 1, so <math>Q_i</math> is a <math>\chi^2</math> dist with <math>r_i</math> degrees of freedom.

'''Case''': Each <math>Q_i</math> is a <math>\chi^2(r_i)</math> distribution.

Fix any <math>i</math>, diagonalize it by orthogonal transform <math>O</math>, and reindex, so that <math>O B_i O^T = diag(\lambda_1, ..., \lambda_{r_i}, 0, ..., 0)</math>. Then <math>Q_i = \sum_j \lambda_j {U'}_j^2</math> for some <math>U'_j</math>, a spherical rotation of <math>U_i</math>.

Since <math>Q_i\sim \chi^2(r_i)</math>, we get all <math>\lambda_j = 1</math>. So all <math>B_i\succeq 0</math>, and have eigenvalues <math>0, 1</math>.

So diagonalize them simultaneously, add them up, to find <math>\sum_i r_i = N</math>.

'''Case''': <math>r_1+\cdots +r_k=N</math>.

We first show that the matrices ''B''<sup>(''i'')</sup> can be [[Matrix_diagonalization#Simultaneous_diagonalization|simultaneously diagonalized]] by an orthogonal matrix and that their non-zero [[eigenvalue]]s are all equal to +1. Once that's shown, take this orthogonal transform to this simultaneous [[eigenbasis]], in which the random vector <math>[U_1, ..., U_N]^T</math> becomes <math>[U'_1, ..., U'_N]^T</math>, but all <math>U_i'</math> are still independent and standard Gaussian. Then the result follows.

Each of the matrices ''B''<sup>(''i'')</sup> has [[rank (linear algebra)|rank]] ''r''<sub>''i''</sub> and thus ''r''<sub>''i''</sub> non-zero [[eigenvalue]]s. For each ''i'', the sum <math>C^{(i)} \equiv \sum_{j\ne i}B^{(j)}</math> has at most rank <math>\sum_{j\ne i}r_j = N-r_i</math>. Since <math>B^{(i)}+C^{(i)} = I_{N \times N}</math>, it follows that ''C''<sup>(''i'')</sup> has exactly rank ''N''&nbsp;&minus;&nbsp;''r''<sub>''i''</sub>.

Therefore ''B''<sup>(''i'')</sup> and ''C''<sup>(''i'')</sup> can be [[Matrix_diagonalization#Simultaneous_diagonalization|simultaneously diagonalized]]. This can be shown by first diagonalizing ''B''<sup>(''i'')</sup>, by the [[spectral theorem]]. In this basis, it is of the form:
:<math>\begin{bmatrix}
\lambda_1  & 0          & 0      & \cdots        & \cdots &        & 0 \\
0          & \lambda_2  & 0      & \cdots        & \cdots &        & 0 \\
0          &  0         & \ddots &               &        &        & \vdots \\
\vdots     &  \vdots    &        & \lambda_{r_i} &        & \\
\vdots     & \vdots     &        &               &  0     & \\
0          & \vdots     &        &               &        & \ddots \\
0          & 0          & \ldots &               &        &        & 0
 \end{bmatrix}.</math>

Thus the lower <math>(N-r_i)</math> rows are zero. Since <math>C^{(i)} = I - B^{(i)}</math>, it follows that these rows in ''C''<sup>(''i'')</sup> in this basis contain a right block which is a <math>(N-r_i)\times(N-r_i)</math> unit matrix, with zeros in the rest of these rows. But since ''C''<sup>(''i'')</sup> has rank ''N''&nbsp;&minus;&nbsp;''r''<sub>''i''</sub>, it must be zero elsewhere. Thus it is diagonal in this basis as well. It follows that all the non-zero [[eigenvalue]]s of both ''B''<sup>(''i'')</sup> and ''C''<sup>(''i'')</sup> are +1. This argument applies for all ''i'', thus all ''B''<sup>(''i'')</sup> are positive semidefinite.

Moreover, the above analysis can be repeated in the diagonal basis for <math>C^{(1)} = B^{(2)} + \sum_{j>2}B^{(j)}</math>. In this basis <math>C^{(1)}</math> is the identity of an <math>(N-r_1)\times(N-r_1)</math> vector space, so it follows that both ''B''<sup>(2)</sup> and <math>\sum_{j>2}B^{(j)}</math> are simultaneously diagonalizable in this vector space (and hence also together with ''B''<sup>(1)</sup>). By iteration it follows that all ''B''-s are simultaneously diagonalizable.

Thus there exists an [[orthogonal matrix]] <math>S</math> such that for all <math>i</math>, <math> S^\mathrm{T}B^{(i)} S \equiv B^{(i)\prime} </math> is diagonal, where any entry <math> B^{(i)\prime}_{x,y} </math> with indices <math>x = y</math>, <math> \sum_{j=1}^{i-1} r_j < x = y \le \sum_{j=1}^i r_j </math>, is equal to 1, while any entry with other indices is equal to 0.
}}

<!--
Cochran's theorem is the converse of [[Fisher's theorem]]. -->

== See also ==
* [[Cramér’s decomposition theorem|Cramér's theorem]], on decomposing normal distribution
* [[Infinite divisibility (probability)]]

{{refimprove|date=July 2011}}

==References==
<references/>

{{Experimental design|state=expanded}}

{{DEFAULTSORT:Cochran's Theorem}}
[[Category:Theorems in statistics]]
[[Category:Characterization of probability distributions]]