Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cayley–Hamilton theorem
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications== ===Determinant and inverse matrix=== {{see also|Determinant#Relation to eigenvalues and trace|Characteristic polynomial#Properties}} For a general {{math|''n'' × ''n''}} [[invertible matrix]] {{mvar|A}}, i.e., one with nonzero determinant, {{mvar|A}}<sup>−1</sup> can thus be written as an {{nowrap|{{math|(''n'' − 1)}}-th}} order [[polynomial expression]] in {{mvar|A}}: As indicated, the Cayley–Hamilton theorem amounts to the identity {{Equation box 1 |indent =:: |equation = <math>p(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_1A+(-1)^n\det(A)I_n =0.</math> |cellpadding= 6 |border |border colour = #0073CF |bgcolor=#F9FFF7}} The coefficients {{math|''c''<sub>''i''</sub>}} are given by the [[elementary symmetric polynomial]]s of the [[eigenvalue]]s of {{mvar|A}}. Using [[Newton identities]], the elementary symmetric polynomials can in turn be expressed in terms of [[power sum symmetric polynomial]]s of the eigenvalues: <math display="block">s_k = \sum_{i=1}^n \lambda_i^k = \operatorname{tr}(A^k),</math> where {{math|tr(''A''<sup>''k''</sup>)}} is the [[Trace (linear algebra)|trace]] of the matrix {{mvar|A<sup>''k''</sup>}}. Thus, we can express {{math|''c''<sub>''i''</sub>}} in terms of the trace of powers of {{mvar|A}}. In general, the formula for the coefficients {{math|''c''<sub>''i''</sub>}} is given in terms of complete exponential [[Bell polynomials]] as<ref group="nb">See Sect. 2 of {{harvtxt|Krivoruchenko|2016}}. An explicit expression for the coefficients {{math|''c''<sub>''i''</sub>}} is provided by {{harvtxt| Kondratyuk | Krivoruchenko |1992}}: <math display="block">c_i = \sum_{ k_1,k_2,\ldots ,k_n}\prod_{l=1}^{n} \frac{(-1)^{k_l+1}}{l^{k_l} k_l!}\operatorname{tr}(A^l)^{k_l},</math> where the sum is taken over the sets of all integer partitions {{math|''k''<sub>''l''</sub> ≥ 0}} satisfying the equation <math display="block">\sum_{l=1}^{n}lk_{l} = n-i.</math></ref> <math display="block">c_{n-k} = \frac{(-1)^{k}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k).</math> In particular, the determinant of {{math|''A''}} equals {{math|(−1)<sup>''n''</sup>''c''<sub>0</sub>}}. Thus, the determinant can be written as the [[trace identity]]: <math display="block">\det(A) = \frac{1}{n!} B_n(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{n-1}(n-1)! s_n).</math> Likewise, the characteristic polynomial can be written as <math display="block">-(-1)^n\det(A)I_n = A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n),</math> and, by multiplying both sides by {{math|''A''<sup>−1</sup>}} (note {{math|1=−(−1)<sup>''n''</sup> = (−1)<sup>''n''−1</sup>}}), one is led to an expression for the [[inverse matrix|inverse]] of {{mvar|A}} as a trace identity, <math display="block"> \begin{align} A^{-1} & = \frac{(-1)^{n-1}}{\det A}(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n), \\[5pt] & = \frac{1}{\det A}\sum_{k=0}^{n-1} (-1)^{n+k-1}\frac{A^{n-k-1}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k). \end{align} </math> Another method for obtaining these coefficients {{math|''c''<sub>''k''</sub>}} for a general {{math|''n'' × ''n''}} matrix, provided no root be zero, relies on the following alternative [[Matrix exponential#The determinant of the matrix exponential|expression for the determinant]], <math display="block">p(\lambda)= \det (\lambda I_n -A) = \lambda^n \exp (\operatorname{tr} (\log (I_n - A/\lambda))).</math> Hence, by virtue of the [[Mercator series]], <math display="block">p(\lambda)= \lambda^n \exp \left( -\operatorname{tr} \sum_{m=1}^\infty {({A\over\lambda})^m \over m} \right),</math> where the exponential ''only'' needs be expanded to order {{math|''λ''<sup>−''n''</sup>}}, since {{math|''p''(''λ'')}} is of order {{math|''n''}}, the net negative powers of {{math|''λ''}} automatically vanishing by the C–H theorem. (Again, this requires a ring containing the [[rational number]]s.) [[Derivative|Differentiation]] of this expression with respect to {{mvar|λ}} allows one to express the coefficients of the characteristic polynomial for general {{mvar|n}} as determinants of {{math|''m'' × ''m''}} matrices,<ref group="nb">See, e.g., p. 54 of {{harvnb|Brown|1994}}, which solves [[Jacobi's formula]], <math display="block">\frac{\partial p(\lambda)}{\partial \lambda} = p(\lambda) \sum^\infty _{m=0}\lambda ^{-(m+1)} \operatorname{tr}A^m = p(\lambda) ~ \operatorname{tr} \frac{I}{\lambda I -A}\equiv\operatorname{tr} B~, </math> where {{mvar|B}} is the adjugate matrix of the next section. There also exists an equivalent, related recursive algorithm introduced by [[Urbain Le Verrier]] and [[Dmitry Konstantinovich Faddeev]]—the [[Faddeev–LeVerrier algorithm]], which reads <math display="block">\begin{align} M_0 &\equiv O & c_n &= 1 \qquad &(k=0) \\[5pt] M_k &\equiv AM_{k-1} -\frac{1}{k-1}(\operatorname{tr}(AM_{k-1})) I \qquad \qquad & c_{n-k} &= -\frac 1 k \operatorname{tr}(AM_k) \qquad &k=1,\ldots ,n ~. \end{align}</math> (see, e.g., {{harvnb|Gantmacher|1960|p=88}}.) Observe {{math|''A''<sup>−1</sup> {{=}} −''M''<sub>''n''</sub> /''c''<sub>0</sub>}} as the recursion terminates. See the algebraic proof in the following section, which relies on the modes of the adjugate, {{math|''B''<sub>''k''</sub> ≡ ''M''<sub>''n''−''k''</sub>}}. Specifically, <math>(\lambda I-A) B = I p(\lambda)</math> and the above derivative of {{mvar|p}} when one traces it yields <math display="block">\lambda p' -n p =\operatorname{tr} (AB)~,</math> ({{harvnb|Hou|1998}}), and the above recursions, in turn.</ref> <math display="block">c_{n-m} = \frac{(-1)^m}{m!} \begin{vmatrix} \operatorname{tr}A & m-1 & 0 & \cdots \\ \operatorname{tr}A^2 &\operatorname{tr}A & m-2 &\cdots \\ \vdots & \vdots & & & \vdots \\ \operatorname{tr}A^{m-1} &\operatorname{tr}A^{m-2}& \cdots & \cdots & 1 \\ \operatorname{tr}A^m &\operatorname{tr}A^{m-1}& \cdots & \cdots & \operatorname{tr}A \end{vmatrix} ~.</math> ; Examples For instance, the first few Bell polynomials are {{math|''B''<sub>0</sub>}} = 1, {{math|1=''B''<sub>1</sub>(''x''<sub>1</sub>) = ''x''<sub>1</sub>}}, {{math|1=''B''<sub>2</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>) = ''x''{{su|b=1|p=2}} + ''x''<sub>2</sub>}}, and {{math|1=''B''<sub>3</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>3</sub>) = ''x''{{su|b=1|p=3}} + 3 ''x''<sub>1</sub>''x''<sub>2</sub> + ''x''<sub>3</sub>}}. Using these to specify the coefficients {{math|''c<sub>i</sub>''}} of the characteristic polynomial of a {{math|2 × 2}} matrix yields <math display="block">\begin{align} c_2 = B_0 = 1, \\[4pt] c_1 = \frac{-1}{1!} B_1(s_1) = - s_1 = - \operatorname{tr}(A), \\[4pt] c_0 = \frac{1}{2!} B_2(s_1, -1! s_2) = \frac{1}{2}(s_1^2 - s_2) = \frac{1}{2}((\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)). \end{align}</math> The coefficient {{math|''c''<sub>0</sub>}} gives the determinant of the {{math|2 × 2}} matrix, {{math|''c''<sub>1</sub>}} minus its trace, while its inverse is given by <math display="block"> A^{-1} = \frac{-1}{\det A}(A + c_1 I_2) = \frac{-2(A - \operatorname{tr}(A) I_2)}{(\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)}.</math> It is apparent from the general formula for ''c''<sub>''n''−''k''</sub>, expressed in terms of Bell polynomials, that the expressions <math display="block">-\operatorname{tr}(A)\quad \text{and} \quad \tfrac 1 2 (\operatorname{tr}(A)^2 - \operatorname{tr}(A^2))</math> always give the coefficients {{math|''c''<sub>''n''−1</sub>}} of {{math|''λ''<sup>''n''−1</sup>}} and {{math|''c''<sub>''n''−2</sub>}} of {{math|''λ''<sup>''n''−2</sup>}} in the characteristic polynomial of any {{math|''n'' × ''n''}} matrix, respectively. So, for a {{math|3 × 3}} matrix {{mvar|A}}, the statement of the Cayley–Hamilton theorem can also be written as <math display="block">A^3- (\operatorname{tr}A)A^2+\frac{1}{2}\left((\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right)A-\det(A)I_3=O,</math> where the right-hand side designates a {{math|3 × 3}} matrix with all entries reduced to zero. Likewise, this determinant in the {{math|1=''n'' = 3}} case, is now <math display="block">\begin{align} \det(A) &= \frac{1}{3!} B_3(s_1, -1! s_2, 2! s_3) = \frac{1}{6}(s_1^3 + 3 s_1 (-s_2) + 2 s_3) \\[5pt] &= \frac{1}{6} \left [ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3) \right ]. \end{align}</math> This expression gives the negative of coefficient {{math|''c''<sub>''n''−3</sub>}} of {{math|''λ''<sup>''n''−3</sup>}} in the general case, as seen below. Similarly, one can write for a {{math|4 × 4}} matrix {{mvar|A}}, <math display="block"> A^4-(\operatorname{tr}A)A^3 + \tfrac{1}{2}\left[(\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right]A^2 - \tfrac{1}{6}\left[ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3)\right] A + \det(A)I_4 = O,</math> where, now, the determinant is {{math|''c''<sub>''n''−4</sub>}}, <math display="block">\tfrac{1}{24}\! \left [ (\operatorname{tr}A)^4-6 \operatorname{tr}(A^2)(\operatorname{tr}A)^2 + 3\left(\operatorname{tr}(A^2)\right)^2 + 8\operatorname{tr}(A^3)\operatorname{tr}(A) -6\operatorname{tr}(A^4) \right ],</math> and so on for larger matrices. The increasingly complex expressions for the coefficients {{math|''c''<sub>''k''</sub>}} is deducible from [[Newton's identities]] or the [[Faddeev–LeVerrier algorithm]]. ===''n''-th power of matrix=== The Cayley–Hamilton theorem always provides a relationship between the powers of {{mvar|A}} (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power {{mvar|A}}<sup>''n''</sup> or any higher powers of {{mvar|A}}. As an example, for <math>A = \begin{pmatrix}1&2\\3&4\end{pmatrix}</math> the theorem gives <math display="block">A^2=5A+2I_2\, .</math> Then, to calculate {{math|''A''<sup>4</sup>}}, observe <math display="block">\begin{align} A^3&=(5A+2I_2)A=5A^2+2A=5(5A+2I_2)+2A=27A+10I_2, \\[1ex] A^4&=A^3A=(27A+10I_2)A=27A^2+10A=27(5A+2I_2)+10A=145A+54I_2\, . \end{align}</math> Likewise, <math display="block">\begin{align} A^{-1} &= \frac{1}{2}\left(A-5I_2\right)~. \\[1ex] A^{-2} &= A^{-1} A^{-1} = \frac{1}{4} \left(A^2-10A+25I_2\right) = \frac{1}{4} \left((5A+2I_2)-10A+25I_2\right) = \frac{1}{4} \left(-5A+27I_2\right)~. \end{align}</math> Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order {{math|''k''}} can be written as a matrix polynomial of degree at most {{math|''n'' − 1}}, where {{math|''n''}} is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically. ===Matrix functions=== Given an [[analytic function]] <math display="block">f(x) = \sum_{k=0}^\infty a_k x^k</math> and the characteristic polynomial {{math|''p''(''x'')}} of degree {{math|''n''}} of an {{math|''n'' × ''n''}} matrix {{mvar|A}}, the function can be expressed using long division as <math display="block">f(x) = q(x) p(x) + r(x),</math> where {{math|''q''(''x'')}} is some quotient polynomial and {{math|''r''(''x'')}} is a remainder polynomial such that {{math|0 ≤ deg ''r''(''x'') < ''n''}}. By the Cayley–Hamilton theorem, replacing {{mvar|x}} by the matrix {{mvar|A}} gives {{math|1=''p''(''A'') = 0}}, so one has <math display="block">f(A) = r(A). </math> Thus, the analytic function of the matrix {{mvar|''A''}} can be expressed as a matrix polynomial of degree less than {{mvar|''n''}}. Let the remainder polynomial be <math display="block">r(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1}.</math> Since {{math|1=''p''(''λ'') = 0}}, evaluating the function {{math|''f''(''x'')}} at the {{math|''n''}} eigenvalues of {{math|''A''}} yields <math display="block"> f(\lambda_i) = r(\lambda_i) = c_0 + c_1 \lambda_i + \cdots + c_{n-1} \lambda_i^{n-1}, \qquad \text{for } i=1,2,...,n.</math> This amounts to a system of {{math|''n''}} [[linear equation]]s, which can be solved to determine the coefficients {{math|''c<sub>i</sub>''}}. Thus, one has <math display="block">f(A) = \sum_{k=0}^{n-1} c_k A^k.</math> When the eigenvalues are repeated, that is {{math|1=''λ<sub>i</sub> = λ<sub>j</sub>''}} for some {{math|''i ≠ j''}}, two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue {{math|''λ''}} with multiplicity {{math|''m''}}, the first {{math|''m'' – 1}} derivatives of {{math|''p''(''x'')}} vanish at the eigenvalue. This leads to the extra {{math|''m'' – 1}} linearly independent solutions <math display="block">\left.\frac{\mathrm{d}^k f(x)}{\mathrm{d}x^k}\right|_{x=\lambda} = \left.\frac{\mathrm{d}^k r(x)}{\mathrm{d}x^k}\right|_{x=\lambda}\qquad \text{for } k = 1, 2, \ldots, m-1,</math> which, combined with others, yield the required {{math|''n''}} equations to solve for {{math|''c<sub>i</sub>''}}. Finding a polynomial that passes through the points {{math|(''λ<sub>i</sub>'',  ''f'' (''λ<sub>i</sub>''))}} is essentially an [[polynomial interpolation|interpolation problem]], and can be solved using [[Lagrange interpolation|Lagrange]] or [[Newton polynomial|Newton interpolation]] techniques, leading to [[Sylvester's formula]]. For example, suppose the task is to find the polynomial representation of <math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}1&2\\0&3\end{pmatrix}.</math> The characteristic polynomial is {{math|1=''p''(''x'') = (''x'' − 1)(''x'' − 3) = ''x''<sup>2</sup> − 4''x'' + 3}}, and the eigenvalues are {{math|1=''λ'' = 1, 3}}. Let {{math|1=''r''(''x'') = ''c''<sub>0</sub> + ''c''<sub>1</sub>''x''}}. Evaluating {{math|1=''f''(''λ'') = ''r''(''λ'')}} at the eigenvalues, one obtains two linear equations, {{math|1=''e''<sup>''t''</sup> = ''c''<sub>0</sub> + ''c''<sub>1</sub>}} and {{math|1=''e''<sup>3''t''</sup> = ''c''<sub>0</sub> + 3''c''<sub>1</sub>}}. Solving the equations yields {{math|1=''c''<sub>0</sub> = (3''e''<sup>''t''</sup> − ''e''<sup>3''t''</sup>)/2}} and {{math|1=''c''<sub>1</sub> = (''e''<sup>3''t''</sup> − ''e''<sup>''t''</sup>)/2}}. Thus, it follows that <math display="block">e^{At} = c_0 I_2 + c_1 A = \begin{pmatrix}c_0 + c_1 & 2 c_1\\ 0 & c_0 + 3 c_1\end{pmatrix} = \begin{pmatrix}e^{t} & e^{3t} - e^{t} \\ 0 & e^{3t}\end{pmatrix}. </math> If, instead, the function were {{math|1=''f''(''A'') = sin ''At''}}, then the coefficients would have been {{math|1=''c''<sub>0</sub> = (3 sin ''t'' − sin 3''t'')/2}} and {{math|1=''c''<sub>1</sub> = (sin 3''t'' − sin ''t'')/2}}; hence <math display="block">\sin(At) = c_0 I_2 + c_1 A = \begin{pmatrix}\sin t & \sin 3t - \sin t \\ 0 & \sin 3t\end{pmatrix}.</math> As a further example, when considering <math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}0 & 1\\-1 & 0\end{pmatrix},</math> then the characteristic polynomial is {{math|1=''p''(''x'') = ''x''<sup>2</sup> + 1}}, and the eigenvalues are {{math|1=''λ'' = ±''i''}}. As before, evaluating the function at the eigenvalues gives us the linear equations {{math|1=''e<sup>it</sup> = c<sub>0</sub> + i c<sub>1</sub>''}} and {{math|1=''e''<sup>−''it''</sup> = ''c''<sub>0</sub> − ''ic''<sub>1</sub>}}; the solution of which gives, {{math|1=''c''<sub>0</sub> = (''e''<sup>''it''</sup> + ''e''<sup>−''it''</sup>)/2 = cos ''t''}} and {{math|1=''c''<sub>1</sub> = (''e''<sup>''it''</sup> − ''e''<sup>−''it''</sup>)/2''i'' = sin ''t''}}. Thus, for this case, <math display="block">e^{At} = (\cos t) I_2 + (\sin t) A = \begin{pmatrix}\cos t & \sin t\\ -\sin t & \cos t \end{pmatrix},</math> which is a [[rotation matrix]]. Standard examples of such usage is the [[exponential map (Lie theory)|exponential map]] from the [[Lie algebra]] of a [[matrix Lie group]] into the group. It is given by a [[matrix exponential]], <math display="block">\exp: \mathfrak g \rightarrow G; \qquad tX \mapsto e^{tX} = \sum_{n=0}^\infty \frac{t^nX^n}{n!} = I + tX + \frac{t^2X^2}{2} + \cdots, t \in \mathbb R, X \in \mathfrak g .</math> Such expressions have long been known for {{math|SU(2)}}, <math display="block">e^{i(\theta/2)(\hat\mathbf n \cdot \sigma)} = I_2 \cos \frac \theta 2 + i (\hat\mathbf n \cdot \sigma) \sin \frac \theta 2,</math> where the {{mvar|σ}} are the [[Pauli matrices]] and for {{math|SO(3)}}, <math display="block">e^{i\theta(\hat\mathbf n \cdot \mathbf J)} = I_3 + i(\hat\mathbf n \cdot \mathbf J) \sin \theta + (\hat\mathbf n \cdot \mathbf J)^2 (\cos \theta - 1),</math> which is [[Rodrigues' rotation formula]]. For the notation, see [[3D rotation group#A note on Lie algebras]]. More recently, expressions have appeared for other groups, like the [[Lorentz group]] {{math|SO(3, 1)}},<ref>{{harvnb|Zeni|Rodrigues|1992}}</ref> {{math|O(4, 2)}}<ref>{{harvnb|Barut|Zeni|Laufer|1994a}}</ref> and {{math|SU(2, 2)}},<ref>{{harvnb|Barut|Zeni|Laufer|1994b}}</ref> as well as {{math|GL(''n'', '''R''')}}.<ref>{{harvnb|Laufer|1997}}</ref> The group {{math|O(4, 2)}} is the [[conformal group]] of [[spacetime]], {{math|SU(2, 2)}} its [[simply connected]] cover (to be precise, the simply connected cover of the [[connected component (topology)|connected component]] {{math|SO<sup>+</sup>(4, 2)}} of {{math|O(4, 2)}}). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the [[eigenvalue]]s of the matrix to exponentiate. For {{math|SU(2)}} (and hence for {{math|SO(3)}}), closed expressions have been obtained for ''all'' irreducible representations, i.e. of any spin.<ref>{{harvnb|Curtright|Fairlie|Zachos|2014}}</ref> [[File:GeorgFrobenius.jpg|220px|thumb|right|[[Ferdinand Georg Frobenius]] (1849–1917), German mathematician. His main interests were [[elliptic function]]s, [[differential equation]]s, and later [[group theory]].<br/>In 1878 he gave the first full proof of the Cayley–Hamilton theorem.<ref name="Frobenius 1878"/>]] ===Algebraic number theory=== The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of [[algebraic integer]]s. For example, given a finite extension <math>\mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> of <math>\mathbb{Q}</math> and an algebraic integer <math>\alpha \in \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> which is a non-zero linear combination of the <math>\alpha_1^{n_1}\cdots\alpha_k^{n_k}</math> we can compute the minimal polynomial of <math>\alpha</math> by finding a matrix representing the <math>\mathbb{Q}</math>-[[linear transformation]] <math display="block">\cdot \alpha : \mathbb{Q}[\alpha_1,\ldots,\alpha_k] \to \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> If we call this transformation matrix <math>A</math>, then we can find the minimal polynomial by applying the Cayley–Hamilton theorem to <math>A</math>.<ref>{{cite book|last1=Stein|first1=William|title=Algebraic Number Theory, a Computational Approach | page=29|url=http://wstein.org/books/ant/ant.pdf}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)