Template:Short description

File:Arthur Cayley.jpg
Arthur Cayley, F.R.S. (1821–1895) is widely regarded as Britain's leading pure mathematician of the 19th century. Cayley in 1848 went to Dublin to attend lectures on quaternions by Hamilton, their discoverer. Later Cayley impressed him by being the second to publish work on them.<ref name=Crilly_1>Template:Harvnb</ref> Cayley stated the theorem for matrices of dimension 3 or less, and published a proof for the two-dimensional case.
File:William Rowan Hamilton portrait oval combined.png
William Rowan Hamilton (1805–1865), Irish physicist, astronomer, and mathematician, first foreign member of the American National Academy of Sciences. While maintaining an opposing position about how geometry should be studied, Hamilton always remained on the best terms with Cayley.<ref name=Crilly_1/>

Hamilton proved that for a linear function of quaternions there exists a certain equation, depending on the linear function, that is satisfied by the linear function itself.<ref name=Hamilton_1864a/><ref name=Hamilton_1864b/><ref name=Hamilton_1862/>

In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Rowan Hamilton) states that every square matrix over a commutative ring (such as the real or complex numbers or the integers) satisfies its own characteristic equation.

The characteristic polynomial of an Template:Math matrix Template:Mvar is defined as<ref>Template:Harvnb</ref> <math>p_A(\lambda)=\det(\lambda I_n-A)</math>, where Template:Math is the determinant operation, Template:Mvar is a variable scalar element of the base ring, and Template:Math is the Template:Math identity matrix. Since each entry of the matrix <math>(\lambda I_n-A)</math> is either constant or linear in Template:Mvar, the determinant of <math>(\lambda I_n-A)</math> is a degree-Template:Mvar monic polynomial in Template:Mvar, so it can be written as <math display="block">p_A(\lambda) = \lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda + c_0.</math> By replacing the scalar variable Template:Mvar with the matrix Template:Mvar, one can define an analogous matrix polynomial expression, <math display="block">p_A(A) = A^n + c_{n-1}A^{n-1} + \cdots + c_1A + c_0I_n.</math> (Here, <math>A</math> is the given matrix—not a variable, unlike <math>\lambda</math>—so <math>p_A(A)</math> is a constant rather than a function.) The Cayley–Hamilton theorem states that this polynomial expression is equal to the zero matrix, which is to say that <math>p_A(A) = \mathbf 0;</math> that is, the characteristic polynomial <math>p_A</math> is an annihilating polynomial for <math>A.</math>

One use for the Cayley–Hamilton theorem is that it allows Template:MvarTemplate:Mvar to be expressed as a linear combination of the lower matrix powers of Template:Mvar: <math display="block">A^n = -c_{n-1}A^{n-1} - \cdots - c_1A - c_0I_n.</math> When the ring is a field, the Cayley–Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix divides its characteristic polynomial.

A special case of the theorem was first proved by Hamilton in 1853<ref name=Hamilton_1853>Template:Harvnb</ref> in terms of inverses of linear functions of quaternions.<ref name=Hamilton_1864a>Template:Harvnb</ref><ref name=Hamilton_1864b>Template:Harvnb</ref><ref name=Hamilton_1862>Template:Harvnb</ref> This corresponds to the special case of certain Template:Math real or Template:Math complex matrices. Cayley in 1858 stated the result for Template:Math and smaller matrices, but only published a proof for the Template:Math case.<ref>Template:Harvnb</ref><ref>Template:Harvnb</ref> As for Template:Math matrices, Cayley stated “..., I have not thought it necessary to undertake the labor of a formal proof of the theorem in the general case of a matrix of any degree”. The general case was first proved by Ferdinand Frobenius in 1878.<ref name="Frobenius 1878">Template:Harvnb</ref>

ExamplesEdit

Template:Math matricesEdit

For a Template:Math matrix Template:Math, the characteristic polynomial is given by Template:Math, and so Template:Math is trivial.

Template:Math matricesEdit

As a concrete example, let <math display="block">A = \begin{pmatrix}1&2\\3&4\end{pmatrix}.</math> Its characteristic polynomial is given by <math display="block"> \begin{align} p(\lambda) &= \det(\lambda I_2-A) = \det\!\begin{pmatrix}\lambda-1&-2\\-3&\lambda-4\end{pmatrix} \\ &=(\lambda-1)(\lambda-4)-(-2)(-3)=\lambda^2-5\lambda-2. \end{align} </math>

The Cayley–Hamilton theorem claims that, if we define <math display="block">p(X) = X^2 - 5X - 2I_2,</math> then <math display="block">p(A) = A^2-5A-2I_2 = \begin{pmatrix}0&0\\0&0\\\end{pmatrix}.</math> We can verify by computation that indeed, <math display="block">A^2-5A-2I_2 = \begin{pmatrix}7&10\\15&22\\\end{pmatrix} - \begin{pmatrix}5&10\\15&20\\\end{pmatrix} - \begin{pmatrix}2&0\\0&2\\\end{pmatrix} = \begin{pmatrix}0&0\\0&0\\\end{pmatrix}.</math>

For a generic Template:Math matrix, <math display="block">A=\begin{pmatrix}a&b\\c&d\\\end{pmatrix} ,</math>

the characteristic polynomial is given by Template:Math, so the Cayley–Hamilton theorem states that <math display="block">p(A) = A^2-(a+d)A+(ad-bc)I_2 = \begin{pmatrix}0&0\\0&0\end{pmatrix};</math> which is indeed always the case, evident by working out the entries of Template:Math.

Template:Math proof

ApplicationsEdit

Determinant and inverse matrixEdit

Template:See also For a general Template:Math invertible matrix Template:Mvar, i.e., one with nonzero determinant, Template:Mvar−1 can thus be written as an Template:Nowrap order polynomial expression in Template:Mvar: As indicated, the Cayley–Hamilton theorem amounts to the identity Template:Equation box 1

The coefficients Template:Math are given by the elementary symmetric polynomials of the eigenvalues of Template:Mvar. Using Newton identities, the elementary symmetric polynomials can in turn be expressed in terms of power sum symmetric polynomials of the eigenvalues: <math display="block">s_k = \sum_{i=1}^n \lambda_i^k = \operatorname{tr}(A^k),</math> where Template:Math is the trace of the matrix Template:Mvar. Thus, we can express Template:Math in terms of the trace of powers of Template:Mvar.

In general, the formula for the coefficients Template:Math is given in terms of complete exponential Bell polynomials as<ref group="nb">See Sect. 2 of Template:Harvtxt. An explicit expression for the coefficients Template:Math is provided by Template:Harvtxt: <math display="block">c_i = \sum_{ k_1,k_2,\ldots ,k_n}\prod_{l=1}^{n} \frac{(-1)^{k_l+1}}{l^{k_l} k_l!}\operatorname{tr}(A^l)^{k_l},</math> where the sum is taken over the sets of all integer partitions Template:Math satisfying the equation <math display="block">\sum_{l=1}^{n}lk_{l} = n-i.</math></ref> <math display="block">c_{n-k} = \frac{(-1)^{k}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k).</math>

In particular, the determinant of Template:Math equals Template:Math. Thus, the determinant can be written as the trace identity: <math display="block">\det(A) = \frac{1}{n!} B_n(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{n-1}(n-1)! s_n).</math>

Likewise, the characteristic polynomial can be written as <math display="block">-(-1)^n\det(A)I_n = A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n),</math> and, by multiplying both sides by Template:Math (note Template:Math), one is led to an expression for the inverse of Template:Mvar as a trace identity, <math display="block"> \begin{align} A^{-1} & = \frac{(-1)^{n-1}}{\det A}(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n), \\[5pt]

& = \frac{1}{\det A}\sum_{k=0}^{n-1} (-1)^{n+k-1}\frac{A^{n-k-1}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k). 

\end{align} </math>

Another method for obtaining these coefficients Template:Math for a general Template:Math matrix, provided no root be zero, relies on the following alternative expression for the determinant, <math display="block">p(\lambda)= \det (\lambda I_n -A) = \lambda^n \exp (\operatorname{tr} (\log (I_n - A/\lambda))).</math> Hence, by virtue of the Mercator series, <math display="block">p(\lambda)= \lambda^n \exp \left( -\operatorname{tr} \sum_{m=1}^\infty {({A\over\lambda})^m \over m} \right),</math> where the exponential only needs be expanded to order Template:Math, since Template:Math is of order Template:Math, the net negative powers of Template:Math automatically vanishing by the C–H theorem. (Again, this requires a ring containing the rational numbers.) Differentiation of this expression with respect to Template:Mvar allows one to express the coefficients of the characteristic polynomial for general Template:Mvar as determinants of Template:Math matrices,<ref group="nb">See, e.g., p. 54 of Template:Harvnb, which solves Jacobi's formula, <math display="block">\frac{\partial p(\lambda)}{\partial \lambda} = p(\lambda) \sum^\infty _{m=0}\lambda ^{-(m+1)} \operatorname{tr}A^m = p(\lambda) ~ \operatorname{tr} \frac{I}{\lambda I -A}\equiv\operatorname{tr} B~, </math> where Template:Mvar is the adjugate matrix of the next section.

There also exists an equivalent, related recursive algorithm introduced by Urbain Le Verrier and Dmitry Konstantinovich Faddeev—the Faddeev–LeVerrier algorithm, which reads

<math display="block">\begin{align} M_0 &\equiv O & c_n &= 1 \qquad &(k=0) \\[5pt] M_k &\equiv AM_{k-1} -\frac{1}{k-1}(\operatorname{tr}(AM_{k-1})) I \qquad \qquad & c_{n-k} &= -\frac 1 k \operatorname{tr}(AM_k) \qquad &k=1,\ldots ,n ~. \end{align}</math>

(see, e.g., Template:Harvnb.) Observe Template:Math as the recursion terminates. See the algebraic proof in the following section, which relies on the modes of the adjugate, Template:Math. Specifically, <math>(\lambda I-A) B = I p(\lambda)</math> and the above derivative of Template:Mvar when one traces it yields

<math display="block">\lambda p' -n p =\operatorname{tr} (AB)~,</math> (Template:Harvnb), and the above recursions, in turn.</ref> <math display="block">c_{n-m} = \frac{(-1)^m}{m!} \begin{vmatrix} \operatorname{tr}A & m-1 & 0 & \cdots \\ \operatorname{tr}A^2 &\operatorname{tr}A & m-2 &\cdots \\

\vdots & \vdots & & & \vdots \\

\operatorname{tr}A^{m-1} &\operatorname{tr}A^{m-2}& \cdots & \cdots & 1 \\ \operatorname{tr}A^m &\operatorname{tr}A^{m-1}& \cdots & \cdots & \operatorname{tr}A \end{vmatrix} ~.</math>

Examples

For instance, the first few Bell polynomials are Template:Math = 1, Template:Math, Template:Math, and Template:Math.

Using these to specify the coefficients Template:Math of the characteristic polynomial of a Template:Math matrix yields

<math display="block">\begin{align} c_2 = B_0 = 1, \\[4pt] c_1 = \frac{-1}{1!} B_1(s_1) = - s_1 = - \operatorname{tr}(A), \\[4pt] c_0 = \frac{1}{2!} B_2(s_1, -1! s_2) = \frac{1}{2}(s_1^2 - s_2) = \frac{1}{2}((\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)). \end{align}</math>

The coefficient Template:Math gives the determinant of the Template:Math matrix, Template:Math minus its trace, while its inverse is given by <math display="block"> A^{-1} = \frac{-1}{\det A}(A + c_1 I_2) = \frac{-2(A - \operatorname{tr}(A) I_2)}{(\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)}.</math>

It is apparent from the general formula for cnk, expressed in terms of Bell polynomials, that the expressions <math display="block">-\operatorname{tr}(A)\quad \text{and} \quad \tfrac 1 2 (\operatorname{tr}(A)^2 - \operatorname{tr}(A^2))</math>

always give the coefficients Template:Math of Template:Math and Template:Math of Template:Math in the characteristic polynomial of any Template:Math matrix, respectively. So, for a Template:Math matrix Template:Mvar, the statement of the Cayley–Hamilton theorem can also be written as <math display="block">A^3- (\operatorname{tr}A)A^2+\frac{1}{2}\left((\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right)A-\det(A)I_3=O,</math> where the right-hand side designates a Template:Math matrix with all entries reduced to zero. Likewise, this determinant in the Template:Math case, is now <math display="block">\begin{align} \det(A) &= \frac{1}{3!} B_3(s_1, -1! s_2, 2! s_3) = \frac{1}{6}(s_1^3 + 3 s_1 (-s_2) + 2 s_3) \\[5pt]

&= \frac{1}{6} \left [ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3) \right ].

\end{align}</math> This expression gives the negative of coefficient Template:Math of Template:Math in the general case, as seen below.

Similarly, one can write for a Template:Math matrix Template:Mvar, <math display="block"> A^4-(\operatorname{tr}A)A^3 + \tfrac{1}{2}\left[(\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right]A^2 - \tfrac{1}{6}\left[ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3)\right] A + \det(A)I_4 = O,</math>

where, now, the determinant is Template:Math,

<math display="block">\tfrac{1}{24}\! \left [ (\operatorname{tr}A)^4-6 \operatorname{tr}(A^2)(\operatorname{tr}A)^2 + 3\left(\operatorname{tr}(A^2)\right)^2 + 8\operatorname{tr}(A^3)\operatorname{tr}(A) -6\operatorname{tr}(A^4) \right ],</math>

and so on for larger matrices. The increasingly complex expressions for the coefficients Template:Math is deducible from Newton's identities or the Faddeev–LeVerrier algorithm.

n-th power of matrixEdit

The Cayley–Hamilton theorem always provides a relationship between the powers of Template:Mvar (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power Template:Mvarn or any higher powers of Template:Mvar.

As an example, for <math>A = \begin{pmatrix}1&2\\3&4\end{pmatrix}</math> the theorem gives <math display="block">A^2=5A+2I_2\, .</math>

Then, to calculate Template:Math, observe <math display="block">\begin{align} A^3&=(5A+2I_2)A=5A^2+2A=5(5A+2I_2)+2A=27A+10I_2, \\[1ex] A^4&=A^3A=(27A+10I_2)A=27A^2+10A=27(5A+2I_2)+10A=145A+54I_2\, . \end{align}</math> Likewise, <math display="block">\begin{align} A^{-1} &= \frac{1}{2}\left(A-5I_2\right)~. \\[1ex] A^{-2} &= A^{-1} A^{-1} = \frac{1}{4} \left(A^2-10A+25I_2\right) = \frac{1}{4} \left((5A+2I_2)-10A+25I_2\right) = \frac{1}{4} \left(-5A+27I_2\right)~. \end{align}</math>

Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order Template:Math can be written as a matrix polynomial of degree at most Template:Math, where Template:Math is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.

Matrix functionsEdit

Given an analytic function <math display="block">f(x) = \sum_{k=0}^\infty a_k x^k</math> and the characteristic polynomial Template:Math of degree Template:Math of an Template:Math matrix Template:Mvar, the function can be expressed using long division as <math display="block">f(x) = q(x) p(x) + r(x),</math> where Template:Math is some quotient polynomial and Template:Math is a remainder polynomial such that Template:Math.

By the Cayley–Hamilton theorem, replacing Template:Mvar by the matrix Template:Mvar gives Template:Math, so one has <math display="block">f(A) = r(A). </math>

Thus, the analytic function of the matrix Template:Mvar can be expressed as a matrix polynomial of degree less than Template:Mvar.

Let the remainder polynomial be <math display="block">r(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1}.</math> Since Template:Math, evaluating the function Template:Math at the Template:Math eigenvalues of Template:Math yields <math display="block"> f(\lambda_i) = r(\lambda_i) = c_0 + c_1 \lambda_i + \cdots + c_{n-1} \lambda_i^{n-1}, \qquad \text{for } i=1,2,...,n.</math> This amounts to a system of Template:Math linear equations, which can be solved to determine the coefficients Template:Math. Thus, one has <math display="block">f(A) = \sum_{k=0}^{n-1} c_k A^k.</math>

When the eigenvalues are repeated, that is Template:Math for some Template:Math, two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue Template:Math with multiplicity Template:Math, the first Template:Math derivatives of Template:Math vanish at the eigenvalue. This leads to the extra Template:Math linearly independent solutions <math display="block">\left.\frac{\mathrm{d}^k f(x)}{\mathrm{d}x^k}\right|_{x=\lambda} = \left.\frac{\mathrm{d}^k r(x)}{\mathrm{d}x^k}\right|_{x=\lambda}\qquad \text{for } k = 1, 2, \ldots, m-1,</math> which, combined with others, yield the required Template:Math equations to solve for Template:Math.

Finding a polynomial that passes through the points Template:Math is essentially an interpolation problem, and can be solved using Lagrange or Newton interpolation techniques, leading to Sylvester's formula.

For example, suppose the task is to find the polynomial representation of <math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}1&2\\0&3\end{pmatrix}.</math>

The characteristic polynomial is Template:Math, and the eigenvalues are Template:Math. Let Template:Math. Evaluating Template:Math at the eigenvalues, one obtains two linear equations, Template:Math and Template:Math.

Solving the equations yields Template:Math and Template:Math. Thus, it follows that <math display="block">e^{At} = c_0 I_2 + c_1 A = \begin{pmatrix}c_0 + c_1 & 2 c_1\\ 0 & c_0 + 3 c_1\end{pmatrix} = \begin{pmatrix}e^{t} & e^{3t} - e^{t} \\ 0 & e^{3t}\end{pmatrix}. </math>

If, instead, the function were Template:Math, then the coefficients would have been Template:Math and Template:Math; hence <math display="block">\sin(At) = c_0 I_2 + c_1 A = \begin{pmatrix}\sin t & \sin 3t - \sin t \\ 0 & \sin 3t\end{pmatrix}.</math>

As a further example, when considering <math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}0 & 1\\-1 & 0\end{pmatrix},</math> then the characteristic polynomial is Template:Math, and the eigenvalues are Template:Math.

As before, evaluating the function at the eigenvalues gives us the linear equations Template:Math and Template:Math; the solution of which gives, Template:Math and Template:Math. Thus, for this case, <math display="block">e^{At} = (\cos t) I_2 + (\sin t) A = \begin{pmatrix}\cos t & \sin t\\ -\sin t & \cos t \end{pmatrix},</math> which is a rotation matrix.

Standard examples of such usage is the exponential map from the Lie algebra of a matrix Lie group into the group. It is given by a matrix exponential, <math display="block">\exp: \mathfrak g \rightarrow G; \qquad tX \mapsto e^{tX} = \sum_{n=0}^\infty \frac{t^nX^n}{n!} = I + tX + \frac{t^2X^2}{2} + \cdots, t \in \mathbb R, X \in \mathfrak g .</math> Such expressions have long been known for Template:Math, <math display="block">e^{i(\theta/2)(\hat\mathbf n \cdot \sigma)} = I_2 \cos \frac \theta 2 + i (\hat\mathbf n \cdot \sigma) \sin \frac \theta 2,</math> where the Template:Mvar are the Pauli matrices and for Template:Math, <math display="block">e^{i\theta(\hat\mathbf n \cdot \mathbf J)} = I_3 + i(\hat\mathbf n \cdot \mathbf J) \sin \theta + (\hat\mathbf n \cdot \mathbf J)^2 (\cos \theta - 1),</math> which is Rodrigues' rotation formula. For the notation, see 3D rotation group#A note on Lie algebras.

More recently, expressions have appeared for other groups, like the Lorentz group Template:Math,<ref>Template:Harvnb</ref> Template:Math<ref>Template:Harvnb</ref> and Template:Math,<ref>Template:Harvnb</ref> as well as Template:Math.<ref>Template:Harvnb</ref> The group Template:Math is the conformal group of spacetime, Template:Math its simply connected cover (to be precise, the simply connected cover of the connected component Template:Math of Template:Math). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the eigenvalues of the matrix to exponentiate. For Template:Math (and hence for Template:Math), closed expressions have been obtained for all irreducible representations, i.e. of any spin.<ref>Template:Harvnb</ref>

File:GeorgFrobenius.jpg
Ferdinand Georg Frobenius (1849–1917), German mathematician. His main interests were elliptic functions, differential equations, and later group theory.
In 1878 he gave the first full proof of the Cayley–Hamilton theorem.<ref name="Frobenius 1878"/>

Algebraic number theoryEdit

The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of algebraic integers. For example, given a finite extension <math>\mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> of <math>\mathbb{Q}</math> and an algebraic integer <math>\alpha \in \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> which is a non-zero linear combination of the <math>\alpha_1^{n_1}\cdots\alpha_k^{n_k}</math> we can compute the minimal polynomial of <math>\alpha</math> by finding a matrix representing the <math>\mathbb{Q}</math>-linear transformation <math display="block">\cdot \alpha : \mathbb{Q}[\alpha_1,\ldots,\alpha_k] \to \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> If we call this transformation matrix <math>A</math>, then we can find the minimal polynomial by applying the Cayley–Hamilton theorem to <math>A</math>.<ref>Template:Cite book</ref>

ProofsEdit

The Cayley–Hamilton theorem is an immediate consequence of the existence of the Jordan normal form for matrices over algebraically closed fields, see Template:Section link. In this section, direct proofs are presented.

As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an Template:Math matrix

<math display="block">A = \left(a_{ij}\right)_{i,j=1}^n</math> requires two steps: first the coefficients Template:Math of the characteristic polynomial are determined by development as a polynomial in Template:Math of the determinant

<math display="block">\begin{align} p(t) & = \det(t I_n - A) = \begin{vmatrix}t-a_{1,1}&-a_{1,2}&\cdots&-a_{1,n} \\ -a_{2,1}&t-a_{2,2}&\cdots&-a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ -a_{n,1}&-a_{n,2}& \cdots& t-a_{n,n} \end{vmatrix} \\[5pt] & = t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0, \end{align}</math>

and then these coefficients are used in a linear combination of powers of Template:Math that is equated to the Template:Math zero matrix: <math display="block">A^n+c_{n-1}A^{n-1} + \cdots + c_1 A + c_0 I_n = \begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}.</math>

The left-hand side can be worked out to an Template:Math matrix whose entries are (enormous) polynomial expressions in the set of entries Template:Math of Template:Math, so the Cayley–Hamilton theorem states that each of these Template:Math expressions equals Template:Math. For any fixed value of Template:Math, these identities can be obtained by tedious but straightforward algebraic manipulations. None of these computations, however, can show why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes Template:Math, so a uniform proof for all Template:Math is needed.

PreliminariesEdit

If a vector Template:Math of size Template:Math is an eigenvector of Template:Math with eigenvalue Template:Math, in other words if Template:Math, then <math display="block">\begin{align} p(A)\cdot v & = A^n\cdot v+c_{n-1}A^{n-1}\cdot v+\cdots+c_1A\cdot v+c_0I_n\cdot v \\[6pt] & = \lambda^nv+c_{n-1}\lambda^{n-1}v+\cdots+c_1\lambda v+c_0 v=p(\lambda)v, \end{align}</math> which is the zero vector since Template:Math (the eigenvalues of Template:Math are precisely the roots of Template:Math). This holds for all possible eigenvalues Template:Math, so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if Template:Math admits a basis of eigenvectors, in other words if Template:Math is diagonalizable, then the Cayley–Hamilton theorem must hold for Template:Math, since two matrices that give the same values when applied to each element of a basis must be equal. <math display="block">A=XDX^{-1}, \quad D=\operatorname{diag}(\lambda_i), \quad i=1,2,...,n</math> <math display="block">p_A(\lambda)=|\lambda I-A|=\prod_{i=1}^n (\lambda-\lambda_i)\equiv \sum_{k=0}^n c_k\lambda^k</math> <math display="block">p_A(A)=\sum c_k A^k=X p_A(D)X^{-1}=X C X^{-1} </math> <math display="block">C_{ii}=\sum_{k=0}^n c_k\lambda_i^k=\prod_{j=1}^n(\lambda_i-\lambda_j)=0, \qquad C_{i,j\neq i}=0</math> <math display="block">\therefore p_A(A)=XCX^{-1}=O .</math>

Consider now the function <math>e\colon M_n \to M_n</math> which maps Template:Math matrices to Template:Math matrices given by the formula <math>e(A)=p_A(A)</math>, i.e. which takes a matrix <math>A</math> and plugs it into its own characteristic polynomial. Not all matrices are diagonalizable, but for matrices with complex coefficients many of them are: the set <math>D</math> of diagonalizable complex square matrices of a given size is dense in the set of all such square matrices<ref>Template:Harvnb</ref> (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have any multiple roots). Now viewed as a function <math>e\colon \C^{n^2}\to \C ^{n^2}</math>(since matrices have <math>n^2</math> entries) we see that this function is continuous. This is true because the entries of the image of a matrix are given by polynomials in the entries of the matrix. Since <math display="block">e(D) = \left\{\begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}\right\}</math>

and since the set <math>D</math> is dense, by continuity this function must map the entire set of Template:Math matrices to the zero matrix. Therefore, the Cayley–Hamilton theorem is true for complex numbers, and must therefore also hold for <math>\Q</math>- or <math>\R</math>-valued matrices.

While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any commutative ring.

There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.

Adjugate matricesEdit

All proofs below use the notion of the adjugate matrix Template:Math of an Template:Math matrix Template:Math, the transpose of its cofactor matrix. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of Template:Math (in fact, by certain Template:Math determinants), in such a way that the following fundamental relations hold, <math display="block">\operatorname{adj}(M)\cdot M=\det(M)I_n=M\cdot\operatorname{adj}(M)~.</math> These relations are a direct consequence of the basic properties of determinants: evaluation of the Template:Math entry of the matrix product on the left gives the expansion by column Template:Math of the determinant of the matrix obtained from Template:Math by replacing column Template:Math by a copy of column Template:Math, which is Template:Math if Template:Math and zero otherwise; the matrix product on the right is similar, but for expansions by rows.

Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.

A direct algebraic proofEdit

This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix Template:Math whose determinant is the characteristic polynomial of Template:Mvar is such a matrix, and since polynomials form a commutative ring, it has an adjugate <math display="block">B=\operatorname{adj}(tI_n-A).</math> Then, according to the right-hand fundamental relation of the adjugate, one has <math display="block">(t I_n - A)B = \det(t I_n - A) I_n = p(t) I_n.</math>

Since Template:Math is also a matrix with polynomials in Template:Math as entries, one can, for each Template:Math, collect the coefficients of Template:Math in each entry to form a matrix Template:Math of numbers, such that one has <math display="block">B = \sum_{i = 0}^{n - 1} t^i B_i.</math> (The way the entries of Template:Math are defined makes clear that no powers higher than Template:Math occur). While this looks like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of Template:Mvar constant matrices, and the coefficient Template:Math has been written to the left of the matrix to stress this point of view.

Now, one can expand the matrix product in our equation by bilinearity: <math display="block">\begin{align}

p(t) I_n &= (t I_n - A)B \\
&=(t I_n - A)\sum_{i = 0}^{n - 1} t^i B_i \\
&=\sum_{i = 0}^{n - 1} tI_n\cdot t^i B_i - \sum_{i = 0}^{n - 1} A\cdot t^i B_i \\
&=\sum_{i = 0}^{n - 1} t^{i + 1} B_i- \sum_{i = 0}^{n - 1} t^i AB_i \\
&=t^n B_{n - 1} + \sum_{i = 1}^{n - 1} t^i(B_{i - 1} - AB_i) - AB_0.

\end{align}</math>

Writing <math display="block">p(t)I_n=t^nI_n+t^{n-1}c_{n-1}I_n+\cdots+tc_1I_n+c_0I_n,</math> one obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of Template:Math as coefficients.

Such an equality can hold only if in any matrix position the entry that is multiplied by a given power Template:Math is the same on both sides; it follows that the constant matrices with coefficient Template:Math in both expressions must be equal. Writing these equations then for Template:Math from Template:Math down to 0, one finds <math display="block">B_{n - 1} = I_n, \qquad B_{i - 1} - AB_i = c_i I_n\quad \text{for }1 \leq i \leq n-1, \qquad -A B_0 = c_0 I_n.</math>

Finally, multiply the equation of the coefficients of Template:Math from the left by Template:Math, and sum up:

<math display="block">A^n B_{n-1} + \sum\limits_{i=1}^{n-1}\left( A^i B_{i-1} - A^{i+1}B_i\right) -A B_0 = A^n+c_{n-1} A^{n-1} + \cdots + c_1A + c_0I_n.</math>

The left-hand sides form a telescoping sum and cancel completely; the right-hand sides add up to <math>p(A)</math>: <math display="block">0 = p(A).</math> This completes the proof.

A proof using polynomials with matrix coefficientsEdit

This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting.

Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of polynomial functions, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in Template:Mvar with matrix coefficients, the variable Template:Mvar must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set Template:Mvar to a specific value. <math display="block">(f+g)(x) = \sum_i \left (f_i+g_i \right )x^i = \sum_i{f_i x^i} + \sum_i{g_i x^i} = f(x) + g(x).</math>

Let <math>M(n,R)</math> be the ring of Template:Math matrices with entries in some ring R (such as the real or complex numbers) that has Template:Mvar as an element. Matrices with as coefficients polynomials in Template:Mvar, such as <math>t I_n - A</math> or its adjugate B in the first proof, are elements of <math>M(n,R[t])</math>.

By collecting like powers of Template:Mvar, such matrices can be written as "polynomials" in Template:Mvar with constant matrices as coefficients; write <math>M(n,R)[t]</math> for the set of such polynomials. Since this set is in bijection with <math>M(n,R[t])</math>, one defines arithmetic operations on it correspondingly, in particular multiplication is given by <math display="block">\left( \sum_i M_i t^i \right) \!\!\left( \sum_j N_j t^j \right) = \sum_{i,j} (M_i N_j) t^{i+j},</math> respecting the order of the coefficient matrices from the two operands; obviously this gives a non-commutative multiplication.

Thus, the identity <math display="block">(t I_n - A)B = p(t) I_n.</math> from the first proof can be viewed as one involving a multiplication of elements in <math>M(n,R)[t]</math>.

At this point, it is tempting to simply set Template:Mvar equal to the matrix Template:Mvar, which makes the first factor on the left equal to the zero matrix, and the right hand side equal to Template:Math; however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map" Template:Math, which replaces each Template:Math by the matrix power Template:Math of Template:Mvar, where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a ring homomorphism: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product <math>Mt^i Nt^j = (M\cdot N) t^{i+j}</math> is defined assuming that Template:Mvar commutes with Template:Mvar, but this may fail if Template:Mvar is replaced by the matrix Template:Mvar.

One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix Template:Mvar is in the center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting Template:Mvar with coefficients is now justified after evaluation).

Now, Template:Mvar is not always in the center of Template:Math, but we may replace Template:Math with a smaller ring provided it contains all the coefficients of the polynomials in question: <math>I_n</math>, Template:Mvar, and the coefficients <math>B_i</math> of the polynomial Template:Math. The obvious choice for such a subring is the centralizer Template:Math of Template:Mvar, the subring of all matrices that commute with Template:Mvar; by definition Template:Mvar is in the center of Template:Math.

This centralizer obviously contains <math>I_n</math>, and Template:Mvar, but one has to show that it contains the matrices <math>B_i</math>. To do this, one combines the two fundamental relations for adjugates, writing out the adjugate Template:Math as a polynomial: <math display="block">\begin{align}

\left(\sum_{i = 0}^m B_i t^i\right)\!(t I_n - A) &= (tI_n - A) \sum_{i = 0}^m B_i t^i \\
\sum_{i = 0}^m B_i t^{i + 1} - \sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m B_i t^{i + 1} - \sum_{i = 0}^m A B_i t^i \\
\sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m A B_i t^i .

\end{align}</math>

Equating the coefficients shows that for each Template:Math, we have Template:Math as desired. Having found the proper setting in which Template:Math is indeed a homomorphism of rings, one can complete the proof as suggested above: <math display="block">\begin{align}

\operatorname{ev}_A\left(p(t)I_n\right) &= \operatorname{ev}_A((tI_n-A)B) \\[5pt]
 p(A) &= \operatorname{ev}_A(tI_n - A)\cdot \operatorname{ev}_A(B) \\[5pt]
 p(A) &= (AI_n-A) \cdot \operatorname{ev}_A(B) = O\cdot\operatorname{ev}_A(B)=O.
\end{align}</math>

This completes the proof.

A synthesis of the first two proofsEdit

In the first proof, one was able to determine the coefficients Template:Math of Template:Math based on the right-hand fundamental relation for the adjugate only. In fact the first Template:Math equations derived can be interpreted as determining the quotient Template:Math of the Euclidean division of the polynomial Template:Math on the left by the monic polynomial Template:Math, while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial Template:Math is defined, and always produces a unique quotient and remainder with the same degree condition as in the commutative case, provided it is specified at which side one wishes Template:Math to be a factor (here that is to the left).

To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write <math>PQ+r = PQ'+r'</math> as <math>P(Q-Q') = r'-r</math> and observe that since Template:Math is monic, Template:Math cannot have a degree less than that of Template:Math, unless Template:Math.

But the dividend Template:Math and divisor Template:Math used here both lie in the subring Template:Math, where Template:Math is the subring of the matrix ring Template:Math generated by Template:Math: the Template:Math-linear span of all powers of Template:Math. Therefore, the Euclidean division can in fact be performed within that commutative polynomial ring, and of course it then gives the same quotient Template:Math and remainder 0 as in the larger ring; in particular this shows that Template:Math in fact lies in Template:Math.

But, in this commutative setting, it is valid to set Template:Math to Template:Math in the equation

<math display="block">p(t)I_n=(tI_n-A)B;</math>

in other words, to apply the evaluation map

<math display="block">\operatorname{ev}_A:(R[A])[t]\to R[A]</math>

which is a ring homomorphism, giving

<math display="block">p(A)=0\cdot\operatorname{ev}_A(B)=0</math>

just like in the second proof, as desired.

In addition to proving the theorem, the above argument tells us that the coefficients Template:Math of Template:Math are polynomials in Template:Math, while from the second proof we only knew that they lie in the centralizer Template:Math of Template:Math; in general Template:Math is a larger subring than Template:Math, and not necessarily commutative. In particular the constant term Template:Math lies in Template:Math. Since Template:Math is an arbitrary square matrix, this proves that Template:Math can always be expressed as a polynomial in Template:Math (with coefficients that depend on Template:Math.

In fact, the equations found in the first proof allow successively expressing <math>B_{n-1}, \ldots, B_1, B_0</math> as polynomials in Template:Math, which leads to the identity Template:Equation box 1 valid for all Template:Math matrices, where <math display="block">p(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0</math> is the characteristic polynomial of Template:Mvar.

Note that this identity also implies the statement of the Cayley–Hamilton theorem: one may move Template:Math to the right hand side, multiply the resulting equation (on the left or on the right) by Template:Math, and use the fact that <math display="block">-A\cdot \operatorname{adj}(-A) = \operatorname{adj}(-A)\cdot (-A) = \det(-A) I_n = c_0I_n.</math>

Template:See also

A proof using matrices of endomorphismsEdit

As was mentioned above, the matrix p(A) in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix A for t; doing that substitution into the matrix <math>t I_n - A</math> before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where Template:Math is obtained directly as the value of a certain determinant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries <math>A_{i,j}</math> of Template:Math, and all of Template:Math itself. One could take for this the ring Template:Math of Template:Math matrices over Template:Math, where the entry <math>A_{i,j}</math> is realised as <math>A_{i,j} I_n</math>, and Template:Math as itself. But considering matrices with matrices as entries might cause confusion with block matrices, which is not intended, as that gives the wrong notion of determinant (recall that the determinant of a matrix is defined as a sum of products of its entries, and in the case of a block matrix this is generally not the same as the corresponding sum of products of its blocks!). It is clearer to distinguish Template:Math from the endomorphism Template:Math of an Template:Mvar-dimensional vector space V (or [[free module|free Template:Math-module]] if Template:Math is not a field) defined by it in a basis <math>e_1, \ldots, e_n</math>, and to take matrices over the ring End(V) of all such endomorphisms. Then Template:Math is a possible matrix entry, while Template:Mvar designates the element of Template:Math whose Template:Math entry is endomorphism of scalar multiplication by <math>A_{i,j}</math>; similarly <math>I_n</math> will be interpreted as element of Template:Math. However, since Template:Math is not a commutative ring, no determinant is defined on Template:Math; this can only be done for matrices over a commutative subring of Template:Math. Now the entries of the matrix <math>\varphi I_n-A</math> all lie in the subring Template:Math generated by the identity and Template:Math, which is commutative. Then a determinant map Template:Math is defined, and <math>\det(\varphi I_n - A)</math> evaluates to the value Template:Math of the characteristic polynomial of Template:Math at Template:Math (this holds independently of the relation between Template:Math and Template:Math); the Cayley–Hamilton theorem states that Template:Math is the null endomorphism.

In this form, the following proof can be obtained from that of Template:Harvtxt (which in fact is the more general statement related to the Nakayama lemma; one takes for the ideal in that proposition the whole ring Template:Math). The fact that Template:Math is the matrix of Template:Math in the basis Template:Math means that <math display="block">\varphi(e_i) = \sum_{j = 1}^n A_{j,i} e_j \quad\text{for }i=1,\ldots,n.</math> One can interpret these as Template:Math components of one equation in Template:Math, whose members can be written using the matrix-vector product Template:Math that is defined as usual, but with individual entries Template:Math and Template:Math in Template:Math being "multiplied" by forming <math>\psi(v)</math>; this gives: <math display="block">\varphi I_n \cdot E = A^\operatorname{tr}\cdot E,</math> where <math>E\in V^n</math> is the element whose component Template:Math is Template:Math (in other words it is the basis Template:Math of Template:Math written as a column of vectors). Writing this equation as <math display="block">(\varphi I_n-A^\operatorname{tr})\cdot E = 0\in V^n</math> one recognizes the transpose of the matrix <math>\varphi I_n-A</math> considered above, and its determinant (as element of Template:Math is also p(φ). To derive from this equation that Template:Math, one left-multiplies by the adjugate matrix of <math>\varphi I_n-A^\operatorname{tr}</math>, which is defined in the matrix ring Template:Math, giving <math display="block">\begin{align} 0 &= \operatorname{adj}(\varphi I_n-A^\operatorname{tr}) \cdot \left((\varphi I_n-A^\operatorname{tr})\cdot E\right) \\[1ex]

 &= \left(\operatorname{adj}(\varphi I_n-A^\operatorname{tr}) \cdot (\varphi I_n-A^\operatorname{tr})\right) \cdot E \\[1ex]
 &= \left(\det(\varphi I_n - A^\operatorname{tr})I_n\right) \cdot E \\[1ex]
 &= (p(\varphi)I_n)\cdot E;

\end{align}</math> the associativity of matrix-matrix and matrix-vector multiplication used in the first step is a purely formal property of those operations, independent of the nature of the entries. Now component Template:Math of this equation says that Template:Math; thus Template:Math vanishes on all Template:Math, and since these elements generate Template:Math it follows that Template:Math, completing the proof.

One additional fact that follows from this proof is that the matrix Template:Math whose characteristic polynomial is taken need not be identical to the value Template:Math substituted into that polynomial; it suffices that Template:Math be an endomorphism of Template:Math satisfying the initial equations

<math display="block">\varphi(e_i) = \sum_j A_{j,i} e_j</math> for some sequence of elements Template:Math that generate Template:Math (which space might have smaller dimension than Template:Mvar, or in case the ring Template:Math is not a field it might not be a free module at all).

A bogus "proof": Template:MathEdit

One persistent elementary but incorrect argument<ref>Template:Harvnb</ref> for the theorem is to "simply" take the definition <math display="block">p(\lambda) = \det(\lambda I_n - A)</math> and substitute Template:Mvar for Template:Mvar, obtaining <math display="block">p(A)=\det(A I_n - A) = \det(A - A) = \det(\mathbf{0}) = 0.</math>

There are many ways to see why this argument is wrong. First, in the Cayley–Hamilton theorem, Template:Math is an Template:Math matrix. However, the right hand side of the above equation is the value of a determinant, which is a scalar. So they cannot be equated unless Template:Math (i.e. Template:Mvar is just a scalar). Second, in the expression <math>\det(\lambda I_n - A)</math>, the variable λ actually occurs at the diagonal entries of the matrix <math>\lambda I_n - A</math>. To illustrate, consider the characteristic polynomial in the previous example again:

<math display="block">\det\!\begin{pmatrix}\lambda-1&-2\\-3&\lambda-4\end{pmatrix}.</math>

If one substitutes the entire matrix Template:Mvar for Template:Mvar in those positions, one obtains

<math display="block">\det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 1 & -2 \\ -3 &\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 4\end{pmatrix},</math>

in which the "matrix" expression is simply not a valid one. Note, however, that if scalar multiples of identity matrices instead of scalars are subtracted in the above, i.e. if the substitution is performed as

<math display="block"> \det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - I_2 & -2I_2 \\ -3I_2 & \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 4I_2 \end{pmatrix},</math>

then the determinant is indeed zero, but the expanded matrix in question does not evaluate to <math>A I_n-A</math>; nor can its determinant (a scalar) be compared to p(A) (a matrix). So the argument that <math>p(A) = \det(A I_n - A) = 0</math> still does not apply.

Actually, if such an argument holds, it should also hold when other multilinear forms instead of determinant is used. For instance, if we consider the permanent function and define <math>q(\lambda) = \operatorname{perm}(\lambda I_n - A)</math>, then by the same argument, we should be able to "prove" that Template:Math. But this statement is demonstrably wrong: in the 2-dimensional case, for instance, the permanent of a matrix is given by

<math display="block">\operatorname{perm}\!\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad + bc.</math>

So, for the matrix Template:Mvar in the previous example,

<math display="block">\begin{align} q(\lambda) & = \operatorname{perm}(\lambda I_2 - A) = \operatorname{perm}\!\begin{pmatrix} \lambda - 1 & -2 \\ -3 & \lambda-4 \end{pmatrix} \\[6pt] & = (\lambda - 1)(\lambda - 4) + (-2)(-3) = \lambda^2 - 5\lambda + 10. \end{align}</math>

Yet one can verify that

<math display="block">q(A) = A^2 - 5A + 10I_2 = 12I_2 \neq 0.</math>

One of the proofs for Cayley–Hamilton theorem above bears some similarity to the argument that <math>p(A) = \det(AI_n-A) = 0</math>. By introducing a matrix with non-numeric coefficients, one can actually let Template:Mvar live inside a matrix entry, but then <math>A I_n</math> is not equal to Template:Mvar, and the conclusion is reached differently.

Proofs using methods of abstract algebraEdit

Basic properties of Hasse–Schmidt derivations on the exterior algebra <math display="inline">A = \bigwedge M</math> of some Template:Mvar-module Template:Mvar (supposed to be free and of finite rank) have been used by Template:Harvtxt to prove the Cayley–Hamilton theorem. See also Template:Harvtxt.

A combinatorial proofEdit

A proof based on developing the Leibniz formula for the characteristic polynomial was given by Straubing<ref>Template:Cite journal</ref> and a generalization was given using trace monoid theory of Foata and Cartier.

Abstraction and generalizationsEdit

The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring Template:Math, and that Template:Math will hold whenever Template:Math is an endomorphism of an Template:Math-module generated by elements Template:Math that satisfies

<math display="block">\varphi(e_j)=\sum a_{ij}e_i, \qquad j =1, \ldots, n.</math>

This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.

The Cayley-Hamilton theorem also holds for matrices over the quaternions, a noncommutative ring.<ref>Template:Harvnb</ref><ref group=nb>Due to the non-commutative nature of the multiplication operation for quaternions and related constructions, care needs to be taken with definitions, most notably in this context, for the determinant. The theorem holds as well for the slightly less well-behaved split-quaternions, see Template:Harvtxt. The rings of quaternions and split-quaternions can both be represented by certain Template:Math complex matrices. (When restricted to unit norm, these are the groups Template:Math and Template:Math respectively.) Therefore it is not surprising that the theorem holds.
There is no such matrix representation for the octonions, since the multiplication operation is not associative in this case. However, a modified Cayley–Hamilton theorem still holds for the octonions, see Template:Harvtxt.</ref>

See alsoEdit

RemarksEdit

Template:Reflist

NotesEdit

Template:Reflist

ReferencesEdit

External linksEdit