Rayleigh quotient

Revision as of 22:23, 4 February 2025 by imported>CRau080 (adding authorlink fields to several ref.s)
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Template:Short description Template:Use American English In mathematics, the Rayleigh quotient<ref>Also known as the Rayleigh–Ritz ratio; named after Walther Ritz and Lord Rayleigh.</ref> (Template:IPAc-en) for a given complex Hermitian matrix <math>M</math> and nonzero vector <math>x</math> is defined as:<ref>Template:Cite book</ref><ref>Template:Cite book</ref><math display="block">R(M,x) = {x^{*} M x \over x^{*} x}.</math>For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the conjugate transpose <math>x^{*}</math> to the usual transpose <math>x'</math>. Note that <math>R(M, c x) = R(M,x)</math> for any non-zero scalar <math>c</math>. Recall that a Hermitian (or real symmetric) matrix is diagonalizable with only real eigenvalues. It can be shown that, for a given matrix, the Rayleigh quotient reaches its minimum value <math>\lambda_\min</math> (the smallest eigenvalue of <math>M</math>) when <math>x</math> is <math>v_\min</math> (the corresponding eigenvector).<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> Similarly, <math>R(M, x) \leq \lambda_\max</math> and <math>R(M, v_\max) = \lambda_\max</math>.

The Rayleigh quotient is used in the min-max theorem to get exact values of all eigenvalues. It is also used in eigenvalue algorithms (such as Rayleigh quotient iteration) to obtain an eigenvalue approximation from an eigenvector approximation.

The range of the Rayleigh quotient (for any matrix, not necessarily Hermitian) is called a numerical range and contains its spectrum. When the matrix is Hermitian, the numerical radius is equal to the spectral norm. Still in functional analysis, <math>\lambda_\max</math> is known as the spectral radius. In the context of <math>C^\star</math>-algebras or algebraic quantum mechanics, the function that to <math>M</math> associates the Rayleigh–Ritz quotient <math>R(M, x)</math> for a fixed <math>x</math> and <math>M</math> varying through the algebra would be referred to as vector state of the algebra.

In quantum mechanics, the Rayleigh quotient gives the expectation value of the observable corresponding to the operator <math>M</math> for a system whose state is given by <math>x</math>.

If we fix the complex matrix <math>M</math>, then the resulting Rayleigh quotient map (considered as a function of <math>x</math>) completely determines <math>M</math> via the polarization identity; indeed, this remains true even if we allow <math>M</math> to be non-Hermitian. However, if we restrict the field of scalars to the real numbers, then the Rayleigh quotient only determines the symmetric part of <math>M</math>.

Bounds for Hermitian MEdit

As stated in the introduction, for any vector x, one has <math>R(M,x) \in \left[\lambda_\min, \lambda_\max \right]</math>, where <math>\lambda_\min, \lambda_\max</math> are respectively the smallest and largest eigenvalues of <math>M</math>. This is immediate after observing that the Rayleigh quotient is a weighted average of eigenvalues of M: <math display="block">R(M,x) = {x^{*} M x \over x^{*} x} = \frac{\sum_{i=1}^n \lambda_i y_i^2}{\sum_{i=1}^n y_i^2}</math> where <math>(\lambda_i, v_i)</math> is the <math>i</math>-th eigenpair after orthonormalization and <math>y_i = v_i^* x</math> is the <math>i</math>th coordinate of x in the eigenbasis. It is then easy to verify that the bounds are attained at the corresponding eigenvectors <math>v_\min, v_\max</math>.

The fact that the quotient is a weighted average of the eigenvalues can be used to identify the second, the third, ... largest eigenvalues. Let <math>\lambda_{\max} = \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n = \lambda_{\min} </math> be the eigenvalues in decreasing order. If <math>n=2</math> and <math>x</math> is constrained to be orthogonal to <math>v_1</math>, in which case <math>y_1 = v_1^*x = 0 </math>, then <math>R(M,x)</math> has maximum value <math>\lambda_2</math>, which is achieved when <math>x = v_2</math>.

Special case of covariance matricesEdit

An empirical covariance matrix <math>M</math> can be represented as the product <math>A'A</math> of the data matrix <math>A</math> pre-multiplied by its transpose <math>A'</math>. Being a positive semi-definite matrix, <math>M</math> has non-negative eigenvalues, and orthogonal (or orthogonalisable) eigenvectors, which can be demonstrated as follows.

Firstly, that the eigenvalues <math>\lambda_i</math> are non-negative: <math display="block">\begin{align} &M v_i = A' A v_i = \lambda_i v_i \\ \Rightarrow{}& v_i' A' A v_i = v_i' \lambda_i v_i \\ \Rightarrow{}& \left\| A v_i \right\|^2 = \lambda_i \left\| v_i \right\|^2 \\ \Rightarrow{}& \lambda_i = \frac{\left\| A v_i \right\|^2}{\left\| v_i \right\|^2} \geq 0. \end{align}</math>

Secondly, that the eigenvectors <math>v_i</math> are orthogonal to one another: <math display="block">\begin{align} &M v_i = \lambda _i v_i \\ \Rightarrow{}& v_j' M v_i = v_j' \lambda _i v_i \\ \Rightarrow{}& \left (M v_j \right )' v_i = \lambda_j v_j' v_i \\ \Rightarrow{}& \lambda_j v_j ' v_i = \lambda _i v_j' v_i \\ \Rightarrow{}& \left (\lambda_j - \lambda_i \right ) v_j ' v_i = 0 \\ \Rightarrow{}& v_j ' v_i = 0 \end{align}</math> if the eigenvalues are different – in the case of multiplicity, the basis can be orthogonalized.

To now establish that the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue, consider decomposing an arbitrary vector <math>x</math> on the basis of the eigenvectors <math>v_i</math>: <math display="block">x = \sum _{i=1} ^n \alpha _i v_i,</math> where <math display="block">\alpha_i = \frac{x' v_i}{v_i' v_i} = \frac{\langle x,v_i\rangle}{\left\| v_i \right\| ^2}</math> is the coordinate of <math>x</math> orthogonally projected onto <math>v_i</math>. Therefore, we have: <math display="block">\begin{align} R(M,x) &= \frac{x' A' A x}{x' x} \\ &= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)' \left ( A' A \right ) \Bigl(\sum _{i=1} ^n \alpha _i v_i \Bigr)}{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)' \Bigl( \sum _{i=1} ^n \alpha _i v_i \Bigr)} \\ &= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)'\Bigl(\sum _{i=1} ^n \alpha _i (A' A) v_i \Bigr)}{ \Bigl( \sum _{i=1}^n \alpha _i^2 {v_i}'{v_i} \Bigr)} \\ &= \frac{ \Bigl( \sum _{j=1} ^n \alpha _j v_j \Bigr)'\Bigl(\sum _{i=1} ^n \alpha _i \lambda_i v_i \Bigr)}{ \Bigl( \sum _{i=1}^n \alpha _i^2 \|{v_i}\|^2 \Bigr)} \end{align}</math> which, by orthonormality of the eigenvectors, becomes: <math display="block">\begin{align} R(M,x) &= \frac{\sum _{i=1} ^n \alpha_i^2 \lambda _i}{\sum _{i=1} ^n \alpha_i^2} \\ &= \sum_{i=1}^n \lambda_i \frac{(x'v_i)^2}{ (x'x)( v_i' v_i)^2} \\ &= \sum_{i=1}^n \lambda_i \frac{(x'v_i)^2}{ (x'x)} \end{align}</math>

The last representation establishes that the Rayleigh quotient is the sum of the squared cosines of the angles formed by the vector <math>x</math> and each eigenvector <math>v_i</math>, weighted by corresponding eigenvalues.

If a vector <math>x</math> maximizes <math>R(M,x)</math>, then any non-zero scalar multiple <math>kx</math> also maximizes <math>R</math>, so the problem can be reduced to the Lagrange problem of maximizing <math display="inline">\sum _{i=1}^n \alpha_i^2 \lambda _i</math> under the constraint that <math display="inline">\sum _{i=1} ^n \alpha _i ^2 = 1</math>.

Define: <math>\beta_i = \alpha_i^2</math>. This then becomes a linear program, which always attains its maximum at one of the corners of the domain. A maximum point will have <math>\alpha_1 = \pm 1</math> and <math>\alpha _i = 0</math> for all <math>i > 1</math> (when the eigenvalues are ordered by decreasing magnitude).

Thus, the Rayleigh quotient is maximized by the eigenvector with the largest eigenvalue.

Formulation using Lagrange multipliersEdit

Alternatively, this result can be arrived at by the method of Lagrange multipliers. The first part is to show that the quotient is constant under scaling <math>x \to cx</math>, where <math>c</math> is a scalar <math display="block">R(M,cx) = \frac {(cx)^{*} M cx} {(cx)^{*} cx} = \frac {c^{*} c} {c^{*} c} \frac {x^{*} M x} {x^{*} x} = R(M,x).</math>

Because of this invariance, it is sufficient to study the special case <math>\|x\|^2 = x^Tx = 1</math>. The problem is then to find the critical points of the function <math display="block">R(M,x) = x^\mathsf{T} M x ,</math> subject to the constraint <math>\|x\|^2 = x^Tx = 1.</math> In other words, it is to find the critical points of <math display="block">\mathcal{L}(x) = x^\mathsf{T} M x -\lambda \left (x^\mathsf{T} x - 1 \right), </math> where <math>\lambda</math> is a Lagrange multiplier. The stationary points of <math>\mathcal{L}(x)</math> occur at <math display="block">\begin{align} &\frac{d\mathcal{L}(x)}{dx} = 0 \\ \Rightarrow{}& 2x^\mathsf{T}M - 2\lambda x^\mathsf{T} = 0 \\ \Rightarrow{}& 2Mx - 2\lambda x = 0 \text{ (taking the transpose of both sides and noting that }M\text{ is Hermitian)}\\ \Rightarrow{}& M x = \lambda x \end{align} </math> and <math display="block"> \therefore R(M,x) = \frac{x^\mathsf{T} M x}{x^\mathsf{T} x} = \lambda \frac{x^\mathsf{T}x}{x^\mathsf{T} x} = \lambda.</math>

Therefore, the eigenvectors <math>x_1, \ldots, x_n</math> of <math>M</math> are the critical points of the Rayleigh quotient and their corresponding eigenvalues <math>\lambda_1, \ldots, \lambda_n</math> are the stationary values of <math>\mathcal{L}</math>. This property is the basis for principal components analysis and canonical correlation.

Use in Sturm–Liouville theoryEdit

Sturm–Liouville theory concerns the action of the linear operator <math display="block">L(y) = \frac{1}{w(x)}\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y\right)</math> on the inner product space defined by <math display="block">\langle{y_1,y_2}\rangle = \int_a^b w(x)y_1(x)y_2(x) \, dx</math> of functions satisfying some specified boundary conditions at a and b. In this case the Rayleigh quotient is <math display="block">\frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} = \frac{\int_a^b y(x)\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y(x)\right)dx}{\int_a^b{w(x)y(x)^2}dx}.</math>

This is sometimes presented in an equivalent form, obtained by separating the integral in the numerator and using integration by parts: <math display="block">\begin{align} \frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} &= \frac{ \left \{ \int_a^b y(x)\left(-\frac{d}{dx}\left[p(x)y'(x)\right]\right) dx \right \} + \left \{\int_a^b{q(x)y(x)^2} \, dx \right \}}{\int_a^b{w(x)y(x)^2} \, dx} \\ &= \frac{ \left \{\left. -y(x)\left[p(x)y'(x)\right] \right |_a^b \right \} + \left \{\int_a^b y'(x)\left[p(x)y'(x)\right] \, dx \right \} + \left \{\int_a^b{q(x)y(x)^2} \, dx \right \}}{\int_a^b w(x)y(x)^2 \, dx}\\ &= \frac{ \left \{ \left. -p(x)y(x)y'(x) \right |_a^b \right \} + \left \{ \int_a^b \left [p(x)y'(x)^2 + q(x)y(x)^2 \right] \, dx \right \} } {\int_a^b{w(x)y(x)^2} \, dx}. \end{align}</math>

GeneralizationsEdit

  1. For a given pair (A, B) of matrices, and a given non-zero vector x, the generalized Rayleigh quotient is defined as: <math display="block">R(A,B; x) := \frac{x^* A x}{x^* B x}.</math> The generalized Rayleigh quotient can be reduced to the Rayleigh quotient <math>R(D, C^*x)</math> through the transformation <math>D = C^{-1} A {C^*}^{-1}</math> where <math>CC^*</math> is the Cholesky decomposition of the Hermitian positive-definite matrix B.
  2. For a given pair (x, y) of non-zero vectors, and a given Hermitian matrix H, the generalized Rayleigh quotient or sometimes two-sided Rayleigh quotient can be defined as: <math display="block">R(H; x,y) := \frac{y^* H x}\sqrt{y^*y \cdot x^*x}</math> which coincides with R(H,x) when x = y. In quantum mechanics, this quantity is called a "matrix element" or sometimes a "transition amplitude".

See alsoEdit

ReferencesEdit

<references/>

Further readingEdit