Editing Rayleigh quotient (section)

==Special case of covariance matrices==
An empirical [[covariance matrix]] <math>M</math> can be represented as the product <math>A'A</math> of the [[data matrix (multivariate statistics)|data matrix]] <math>A</math> pre-multiplied by its transpose <math>A'</math>. Being a positive semi-definite matrix, <math>M</math> has non-negative eigenvalues, and orthogonal (or orthogonalisable) eigenvectors, which can be demonstrated as follows.

Firstly, that the eigenvalues <math>\lambda_i</math> are non-negative:
<math display="block">\begin{align}
&M v_i = A' A v_i = \lambda_i v_i \\
\Rightarrow{}& v_i' A' A v_i = v_i' \lambda_i v_i \\
\Rightarrow{}& \left\| A v_i \right\|^2 = \lambda_i \left\| v_i \right\|^2 \\
\Rightarrow{}& \lambda_i = \frac{\left\| A v_i \right\|^2}{\left\| v_i \right\|^2} \geq 0.
\end{align}</math>

Secondly, that the eigenvectors <math>v_i</math> are orthogonal to one another:
<math display="block">\begin{align}
&M v_i = \lambda _i v_i \\
\Rightarrow{}& v_j' M v_i = v_j' \lambda _i v_i \\
\Rightarrow{}& \left (M v_j \right )' v_i = \lambda_j v_j' v_i \\
\Rightarrow{}& \lambda_j v_j ' v_i = \lambda _i v_j' v_i \\
\Rightarrow{}& \left (\lambda_j - \lambda_i \right ) v_j ' v_i = 0 \\
\Rightarrow{}& v_j ' v_i = 0
\end{align}</math> 
if the eigenvalues are different – in the case of multiplicity, the basis can be orthogonalized.

To now establish that the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue, consider decomposing an arbitrary vector <math>x</math> on the basis of the eigenvectors <math>v_i</math>:
<math display="block">x = \sum _{i=1} ^n \alpha _i v_i,</math>
where
<math display="block">\alpha_i = \frac{x' v_i}{v_i' v_i} = \frac{\langle x,v_i\rangle}{\left\| v_i \right\| ^2}</math>
is the coordinate of <math>x</math> orthogonally projected onto <math>v_i</math>. Therefore, we have:
<math display="block">\begin{align}
R(M,x) &= \frac{x' A' A x}{x' x} \\
&= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)' \left ( A' A \right ) \Bigl(\sum _{i=1} ^n \alpha _i v_i \Bigr)}{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)' \Bigl( \sum _{i=1} ^n \alpha _i v_i \Bigr)} \\
&= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)'\Bigl(\sum _{i=1} ^n \alpha _i (A' A)  v_i \Bigr)}{ \Bigl( \sum _{i=1}^n \alpha _i^2 {v_i}'{v_i} \Bigr)} \\
&= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)'\Bigl(\sum _{i=1} ^n \alpha _i \lambda_i v_i \Bigr)}{ \Bigl( \sum _{i=1}^n \alpha _i^2 \|{v_i}\|^2 \Bigr)}
\end{align}</math>
which, by [[orthonormality]] of the eigenvectors, becomes:
<math display="block">\begin{align}
R(M,x) &= \frac{\sum _{i=1} ^n \alpha_i^2 \lambda _i}{\sum _{i=1} ^n \alpha_i^2} \\
&= \sum_{i=1}^n \lambda_i \frac{(x'v_i)^2}{ (x'x)( v_i' v_i)^2} \\
&= \sum_{i=1}^n \lambda_i \frac{(x'v_i)^2}{ (x'x)}
\end{align}</math>

The last representation establishes that the Rayleigh quotient is the sum of the squared cosines of the angles formed by the vector <math>x</math> and each eigenvector <math>v_i</math>, weighted by corresponding eigenvalues.

If a vector <math>x</math> maximizes <math>R(M,x)</math>, then any non-zero scalar multiple <math>kx</math> also maximizes <math>R</math>, so the problem can be reduced to the [[Lagrange multipliers|Lagrange problem]] of maximizing <math display="inline">\sum _{i=1}^n \alpha_i^2 \lambda _i</math> under the constraint that <math display="inline">\sum _{i=1} ^n \alpha _i ^2 = 1</math>.

Define: <math>\beta_i = \alpha_i^2</math>. This then becomes a [[linear program]], which always attains its maximum at one of the corners of the domain. A maximum point will have <math>\alpha_1 = \pm 1</math> and <math>\alpha _i = 0</math> for all <math>i > 1</math> (when the eigenvalues are ordered by decreasing magnitude).

Thus, the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue.

=== Formulation using Lagrange multipliers ===
Alternatively, this result can be arrived at by the method of [[Lagrange multipliers]]. The first part is to show that the quotient is constant under scaling <math>x \to cx</math>, where <math>c</math> is a scalar
<math display="block">R(M,cx) = \frac {(cx)^{*} M cx} {(cx)^{*} cx} = \frac {c^{*} c} {c^{*} c} \frac {x^{*} M x} {x^{*} x} = R(M,x).</math>

Because of this invariance, it is sufficient to study the special case <math>\|x\|^2 = x^Tx = 1</math>. The problem is then to find the [[critical point (mathematics)|critical points]] of the function
<math display="block">R(M,x) = x^\mathsf{T} M x ,</math>
subject to the constraint <math>\|x\|^2 = x^Tx = 1.</math> In other words, it is to find the critical points of 
<math display="block">\mathcal{L}(x) = x^\mathsf{T} M x  -\lambda \left (x^\mathsf{T} x - 1 \right), </math>
where <math>\lambda</math> is a Lagrange multiplier. The stationary points of <math>\mathcal{L}(x)</math> occur at
<math display="block">\begin{align} 
&\frac{d\mathcal{L}(x)}{dx} = 0 \\
\Rightarrow{}& 2x^\mathsf{T}M  - 2\lambda x^\mathsf{T} = 0 \\
\Rightarrow{}& 2Mx  - 2\lambda x = 0 \text{ (taking the transpose of both sides and noting that }M\text{ is Hermitian)}\\
\Rightarrow{}& M x = \lambda x
\end{align} </math>
and 
<math display="block"> \therefore R(M,x) = \frac{x^\mathsf{T} M x}{x^\mathsf{T} x} = \lambda \frac{x^\mathsf{T}x}{x^\mathsf{T} x} = \lambda.</math>

Therefore, the eigenvectors <math>x_1, \ldots, x_n</math> of <math>M</math> are the critical points of the Rayleigh quotient and their corresponding eigenvalues <math>\lambda_1, \ldots, \lambda_n</math> are the stationary values of <math>\mathcal{L}</math>. This property is the basis for [[principal components analysis]] and [[canonical correlation]].