Editing Gauss–Markov theorem (section)

==Proof==
Let <math>\tilde\beta = Cy</math> be another linear estimator of <math> \beta </math> with <math>C = (X^\operatorname{T}X)^{-1}X^\operatorname{T} + D </math> where <math>D</math> is a <math>K \times n</math> non-zero matrix. As we're restricting to ''unbiased'' estimators, minimum mean squared error implies minimum variance. The goal is therefore to show that such an estimator has a variance no smaller than that of <math> \widehat\beta,</math> the OLS estimator. We calculate:

:<math>
\begin{align}
\operatorname{E} \left[ \tilde\beta \right] &= \operatorname{E}[Cy] \\
&= \operatorname{E} \left [\left ((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D \right )(X\beta + \varepsilon) \right ]\\
&= \left ((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D \right )X\beta    + \left ((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D \right ) \operatorname{E}[\varepsilon] \\
&= \left ((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D \right )X\beta  && \operatorname{E}[\varepsilon] =0 \\
&= (X^\operatorname{T}X)^{-1}X^\operatorname{T}X\beta + DX\beta  \\
&= (I_K + DX)\beta. \\
\end{align}
</math>

Therefore, since <math>\beta</math> is '''un'''observable, <math> \tilde\beta </math> is unbiased if and only if <math> DX = 0 </math>. Then:

:<math>
\begin{align}
 \operatorname{Var}\left(\tilde\beta\right) &=  \operatorname{Var}(Cy) \\
&= C \text{ Var}(y)C^\operatorname{T} \\
&= \sigma^2 CC^\operatorname{T} \\
&= \sigma^2 \left ((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D \right ) \left (X(X^\operatorname{T}X)^{-1} + D^\operatorname{T} \right ) \\
&= \sigma^2 \left ((X^\operatorname{T}X)^{-1}X^\operatorname{T}X(X^\operatorname{T}X)^{-1} + (X^\operatorname{T}X)^{-1}X^\operatorname{T}D^\operatorname{T} + DX(X^\operatorname{T}X)^{-1} + DD^\operatorname{T} \right) \\
&= \sigma^2(X^\operatorname{T}X)^{-1} + \sigma^2(X^\operatorname{T}X)^{-1} (DX)^\operatorname{T} + \sigma^2 DX  (X^\operatorname{T}X)^{-1} + \sigma^2DD^\operatorname{T} \\
&= \sigma^2(X^\operatorname{T}X)^{-1}+ \sigma^2DD^\operatorname{T} && DX =0 \\
&=  \operatorname{Var}\left(\widehat\beta\right)  + \sigma^2DD^\operatorname{T}  && \sigma^2(X^\operatorname{T}X)^{-1} =   \operatorname{Var}\left(\widehat\beta\right)
\end{align}
</math>

Since <math>DD^\operatorname{T}</math> is a positive semidefinite matrix, <math>\operatorname{Var}\left( \tilde \beta \right) </math> exceeds <math>\operatorname{Var}\left(\widehat\beta\right) </math> by a positive semidefinite matrix.

===Remarks on the proof===
As it has been stated before, the condition of <math> \operatorname{Var} \left( \tilde \beta \right)- \operatorname{Var} \left(\widehat\beta\right)</math> is a positive semidefinite matrix is equivalent to the property that the best linear unbiased estimator of <math> \ell^\operatorname{T}\beta </math> is <math> \ell^\operatorname{T}\widehat\beta </math> (best in the sense that it has minimum variance). To see this, let <math> \ell^\operatorname{T}\tilde\beta </math> another linear unbiased estimator of <math> \ell^\operatorname{T}\beta </math>.

:<math>
\begin{align}
\operatorname{Var}\left(\ell^\operatorname{T}\tilde\beta\right) &= \ell^\operatorname{T} \operatorname{Var} \left(\tilde\beta\right) \ell \\
&=\sigma^2 \ell^\operatorname{T} (X^\operatorname{T}X)^{-1}\ell+\ell^\operatorname{T}DD^\operatorname{T}\ell \\
&= \operatorname{Var}\left(\ell^\operatorname{T}\widehat\beta\right)+(D^\operatorname{T}\ell)^\operatorname{T}(D^\operatorname{T}\ell) && \sigma^2 \ell^\operatorname{T} (X^\operatorname{T}X)^{-1}\ell = \operatorname{Var}\left(\ell^\operatorname{T}\widehat\beta\right) \\
&= \operatorname{Var}\left(\ell^\operatorname{T}\widehat\beta\right) +\|D^\operatorname{T}\ell\|\\
& \geq \operatorname{Var}\left(\ell^\operatorname{T}\widehat\beta\right)
\end{align}
</math>

Moreover, equality holds if and only if <math> D^\operatorname{T}\ell=0 </math>. We calculate

:<math>
\begin{align}
\ell^\operatorname{T}\tilde\beta &= \ell^\operatorname{T} \left (((X^\operatorname{T}X)^{-1}X^\operatorname{T} + D) Y \right ) && \text{ from above}\\
&= \ell^\operatorname{T}(X^\operatorname{T}X)^{-1}X^\operatorname{T}Y + \ell^\operatorname{T}DY \\
&= \ell^\operatorname{T}\widehat\beta +(D^\operatorname{T}\ell)^\operatorname{T} Y \\
&=\ell^\operatorname{T}\widehat\beta && D^\operatorname{T}\ell = 0
\end{align}
</math>

This proves that the equality holds if and only if <math> \ell^\operatorname{T}\tilde\beta=\ell^\operatorname{T}\widehat\beta </math> which gives the uniqueness of the OLS estimator as a BLUE.