Editing Gauss–Markov theorem (section)

=== Remark ===
Proof that the OLS indeed ''minimizes'' the sum of squares of residuals may proceed as follows with a calculation of the [[Hessian matrix]] and showing that it is positive definite. 

The MSE function we want to minimize is 
<math display="block">f(\beta_0,\beta_1,\dots,\beta_p) = \sum_{i=1}^n (y_i-\beta_0-\beta_1x_{i1}-\dots-\beta_px_{ip})^2</math>
for a multiple regression model with ''p'' variables. The first derivative is 
<math display="block">\begin{aligned}
\frac{d}{d\boldsymbol{\beta}}f &= -2X^\operatorname{T} \left(\mathbf{y}-X\boldsymbol{\beta}\right)\\
&=-2\begin{bmatrix}
\sum_{i=1}^{n} (y_i - \dots - \beta_px_{ip})\\
\sum_{i=1}^nx_{i1} (y_i-\dots-\beta_px_{ip})\\
\vdots\\ 
\sum_{i=1}^nx_{ip} (y_i-\dots-\beta_px_{ip})
\end{bmatrix}\\
&= \mathbf{0}_{p+1},
\end{aligned}</math>
where <math>X^\operatorname{T}</math> is the design matrix 
<math display="block">X=\begin{bmatrix}
1 & x_{11} & \cdots & x_{1p}\\
1 & x_{21} & \cdots & x_{2p}\\
&&\vdots\\
1 & x_{n1} & \cdots & x_{np}
\end{bmatrix}\in \R^{n\times(p+1)}; \qquad n\geq p+1</math>

The [[Hessian matrix]] of second derivatives is 
<math display="block">\mathcal{H} = 2\begin{bmatrix}
n & \sum_{i=1}^n x_{i1} & \cdots & \sum_{i=1}^n x_{ip} \\
\sum_{i=1}^n x_{i1}& \sum_{i=1}^n x_{i1}^2 & \cdots & \sum_{i=1}^nx_{i1}x_{ip}\\
\vdots & \vdots &\ddots & \vdots \\
\sum_{i=1}^n x_{ip} & \sum_{i=1}^n x_{ip}x_{i1}& \cdots & \sum_{i=1}^n x_{ip}^2
\end{bmatrix} = 2X^\operatorname{T}X</math>

Assuming the columns of <math>X</math> are linearly independent so that <math>X^\operatorname{T} X</math> is invertible, let <math>X=\begin{bmatrix}\mathbf{v_1}& \mathbf{v_2}& \cdots & \mathbf{v}_{p+1}\end{bmatrix}</math>, then 
<math display="block">k_1\mathbf{v_1} + \dots + k_{p+1} \mathbf{v}_{p+1} = \mathbf 0\iff k_1= \dots =k_{p+1}=0</math>

Now let <math>\mathbf{k} = (k_1,\dots,k_{p+1})^T \in \R^{(p+1)\times 1}</math> be an eigenvector of <math>\mathcal{H}</math>. 

<math display="block">\mathbf{k} \ne \mathbf{0} \implies \left(k_1\mathbf{v_1}+\dots+k_{p+1}\mathbf{v}_{p+1}\right)^2 > 0</math>

In terms of vector multiplication, this means 
<math display="block">\begin{bmatrix} k_1 & \cdots & k_{p+1} \end{bmatrix}
\begin{bmatrix}\mathbf{v_1} \\ \vdots \\ \mathbf{v}_{p+1}\end{bmatrix}
\begin{bmatrix}\mathbf{v_1} & \cdots & \mathbf{v}_{p+1}\end{bmatrix}
\begin{bmatrix}k_1 \\ \vdots\\ k_{p+1}\end{bmatrix}
= \mathbf{k}^\operatorname{T}\mathcal{H}\mathbf{k} = \lambda \mathbf{k}^\operatorname{T}\mathbf{k}>0</math>
where <math>\lambda</math> is the eigenvalue corresponding to <math>\mathbf{k}</math>. Moreover, 
<math display="block">\mathbf{k}^\operatorname{T}\mathbf{k} = \sum_{i=1}^{p+1}k_i^2 > 0 \implies \lambda > 0</math>

Finally, as eigenvector <math>\mathbf{k}</math> was arbitrary, it means all eigenvalues of <math>\mathcal{H}</math> are positive, therefore <math>\mathcal{H}</math> is positive definite. Thus, 
<math display="block">\boldsymbol{\beta} = \left(X^\operatorname{T}X\right)^{-1}X^\operatorname{T}Y</math>
is indeed a global minimum.

Or, just see that for all vectors <math>\mathbf{v}, \mathbf{v}^\operatorname{T} X^\operatorname{T} X \mathbf{v} = \|\mathbf{X}\mathbf{v}\|^2 \ge 0 </math>. So the Hessian is positive definite if full rank.