Editing Cayley–Hamilton theorem (section)

== Applications==

===Determinant and inverse matrix===
{{see also|Determinant#Relation to eigenvalues and trace|Characteristic polynomial#Properties}}
For a general {{math|''n''&thinsp;×&thinsp;''n''}} [[invertible matrix]] {{mvar|A}}, i.e., one with nonzero determinant, {{mvar|A}}<sup>−1</sup> can thus be written as an {{nowrap|{{math|(''n'' −&thinsp;1)}}-th}} order [[polynomial expression]] in {{mvar|A}}: As indicated, the Cayley–Hamilton theorem amounts to the identity {{Equation box 1
|indent =::
|equation = <math>p(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_1A+(-1)^n\det(A)I_n =0.</math>
|cellpadding= 6
|border
|border colour = #0073CF
|bgcolor=#F9FFF7}}

The coefficients {{math|''c''<sub>''i''</sub>}} are given by the [[elementary symmetric polynomial]]s of the [[eigenvalue]]s of {{mvar|A}}. Using [[Newton identities]], the elementary symmetric polynomials can in turn be expressed in terms of [[power sum symmetric polynomial]]s of the eigenvalues: 
<math display="block">s_k = \sum_{i=1}^n \lambda_i^k = \operatorname{tr}(A^k),</math> 
where {{math|tr(''A''<sup>''k''</sup>)}} is the [[Trace (linear algebra)|trace]] of the matrix {{mvar|A<sup>''k''</sup>}}. Thus, we can express {{math|''c''<sub>''i''</sub>}} in terms of the trace of powers of {{mvar|A}}.

In general, the formula for the coefficients {{math|''c''<sub>''i''</sub>}} is given in terms of complete exponential [[Bell polynomials]] as<ref group="nb">See Sect. 2 of {{harvtxt|Krivoruchenko|2016}}. An explicit expression for the coefficients
{{math|''c''<sub>''i''</sub>}} is provided by {{harvtxt| Kondratyuk | Krivoruchenko |1992}}:
<math display="block">c_i = \sum_{ k_1,k_2,\ldots ,k_n}\prod_{l=1}^{n} \frac{(-1)^{k_l+1}}{l^{k_l} k_l!}\operatorname{tr}(A^l)^{k_l},</math>
where the sum is taken over the sets of all integer partitions {{math|''k''<sub>''l''</sub> ≥ 0}} satisfying the equation 
<math display="block">\sum_{l=1}^{n}lk_{l} = n-i.</math></ref> 
<math display="block">c_{n-k} = \frac{(-1)^{k}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k).</math>

In particular, the determinant of {{math|''A''}} equals {{math|(−1)<sup>''n''</sup>''c''<sub>0</sub>}}. Thus, the determinant can be written as the [[trace identity]]:
<math display="block">\det(A) = \frac{1}{n!} B_n(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{n-1}(n-1)! s_n).</math>

Likewise, the characteristic polynomial can be written as
<math display="block">-(-1)^n\det(A)I_n = A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n),</math>
and, by multiplying both sides by {{math|''A''<sup>−1</sup>}} (note {{math|1=−(−1)<sup>''n''</sup> = (−1)<sup>''n''−1</sup>}}), one is led to an expression for the [[inverse matrix|inverse]] of {{mvar|A}} as a trace identity,
<math display="block"> 
\begin{align}
A^{-1} & = \frac{(-1)^{n-1}}{\det A}(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_{1}I_n), \\[5pt]
 & = \frac{1}{\det A}\sum_{k=0}^{n-1} (-1)^{n+k-1}\frac{A^{n-k-1}}{k!} B_k(s_1, -1! s_2, 2! s_3, \ldots, (-1)^{k-1}(k-1)! s_k). 
\end{align}
</math>

Another method for obtaining these coefficients {{math|''c''<sub>''k''</sub>}} for a general {{math|''n''&thinsp;×&thinsp;''n''}} matrix, provided no root be zero, relies on the following alternative [[Matrix exponential#The determinant of the matrix exponential|expression for the determinant]],
<math display="block">p(\lambda)= \det (\lambda I_n -A) = \lambda^n \exp (\operatorname{tr} (\log (I_n - A/\lambda))).</math>
Hence, by virtue of the [[Mercator series]],
<math display="block">p(\lambda)= \lambda^n \exp \left( -\operatorname{tr} \sum_{m=1}^\infty {({A\over\lambda})^m \over m} \right),</math>
where the exponential ''only'' needs be expanded to order {{math|''λ''<sup>−''n''</sup>}}, since {{math|''p''(''λ'')}} is of order {{math|''n''}}, the net negative powers of {{math|''λ''}} automatically vanishing by the C–H theorem. (Again, this requires a ring containing the [[rational number]]s.) [[Derivative|Differentiation]] of this expression with respect to {{mvar|λ}} allows one to express the coefficients of the characteristic polynomial for general {{mvar|n}} as determinants of {{math|''m''&thinsp;×&thinsp;''m''}} matrices,<ref group="nb">See, e.g., p.&nbsp;54 of {{harvnb|Brown|1994}}, which solves [[Jacobi's formula]], 
<math display="block">\frac{\partial p(\lambda)}{\partial \lambda} = p(\lambda) \sum^\infty _{m=0}\lambda ^{-(m+1)} \operatorname{tr}A^m = p(\lambda) ~ \operatorname{tr} \frac{I}{\lambda I -A}\equiv\operatorname{tr} B~, </math>
where {{mvar|B}} is the adjugate matrix of the next section.

There also exists an equivalent, related recursive algorithm introduced by [[Urbain Le Verrier]] and [[Dmitry Konstantinovich Faddeev]]—the [[Faddeev–LeVerrier algorithm]], which reads

<math display="block">\begin{align}
M_0 &\equiv O      & c_n &= 1                                                            \qquad &(k=0) \\[5pt]
M_k &\equiv AM_{k-1} -\frac{1}{k-1}(\operatorname{tr}(AM_{k-1})) I \qquad \qquad & c_{n-k} &= -\frac 1 k \operatorname{tr}(AM_k) \qquad &k=1,\ldots ,n ~.
\end{align}</math>

(see, e.g., {{harvnb|Gantmacher|1960|p=88}}.) Observe {{math|''A''<sup>−1</sup> {{=}} −''M''<sub>''n''</sub> /''c''<sub>0</sub>}} as the recursion terminates.
See the algebraic proof in the following section, which relies on the modes of the adjugate, {{math|''B''<sub>''k''</sub> ≡ ''M''<sub>''n''−''k''</sub>}}.
Specifically, <math>(\lambda I-A) B = I p(\lambda)</math> and the above derivative of {{mvar|p}} when one traces it yields

<math display="block">\lambda p' -n p =\operatorname{tr} (AB)~,</math> ({{harvnb|Hou|1998}}), and the above recursions, in turn.</ref>
<math display="block">c_{n-m} = \frac{(-1)^m}{m!}
\begin{vmatrix}
\operatorname{tr}A & m-1 & 0 & \cdots \\
\operatorname{tr}A^2 &\operatorname{tr}A & m-2 &\cdots \\
 \vdots & \vdots & & & \vdots \\
\operatorname{tr}A^{m-1} &\operatorname{tr}A^{m-2}& \cdots & \cdots & 1 \\
\operatorname{tr}A^m &\operatorname{tr}A^{m-1}& \cdots & \cdots & \operatorname{tr}A \end{vmatrix} ~.</math>

; Examples
For instance, the first few Bell polynomials are {{math|''B''<sub>0</sub>}} = 1, {{math|1=''B''<sub>1</sub>(''x''<sub>1</sub>) = ''x''<sub>1</sub>}}, {{math|1=''B''<sub>2</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>) = ''x''{{su|b=1|p=2}} + ''x''<sub>2</sub>}}, and {{math|1=''B''<sub>3</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>3</sub>) = ''x''{{su|b=1|p=3}} + 3 ''x''<sub>1</sub>''x''<sub>2</sub> + ''x''<sub>3</sub>}}.

Using these to specify the coefficients {{math|''c<sub>i</sub>''}} of the characteristic polynomial of a {{math|2&thinsp;×&thinsp;2}} matrix yields

<math display="block">\begin{align}
c_2 = B_0 = 1, \\[4pt]
c_1 = \frac{-1}{1!} B_1(s_1) = - s_1 = - \operatorname{tr}(A), \\[4pt]
c_0 = \frac{1}{2!} B_2(s_1, -1! s_2) = \frac{1}{2}(s_1^2 - s_2) = \frac{1}{2}((\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)).
\end{align}</math>

The coefficient {{math|''c''<sub>0</sub>}} gives the determinant of the {{math|2&thinsp;×&thinsp;2}} matrix, {{math|''c''<sub>1</sub>}} minus its trace, while its inverse is given by
<math display="block"> A^{-1} = \frac{-1}{\det A}(A + c_1 I_2) = \frac{-2(A - \operatorname{tr}(A) I_2)}{(\operatorname{tr}(A))^2 - \operatorname{tr}(A^2)}.</math>

It is apparent from the general formula for ''c''<sub>''n''−''k''</sub>, expressed in terms of Bell polynomials, that the expressions
<math display="block">-\operatorname{tr}(A)\quad \text{and} \quad \tfrac 1 2 (\operatorname{tr}(A)^2 - \operatorname{tr}(A^2))</math>

always give the coefficients {{math|''c''<sub>''n''−1</sub>}} of {{math|''λ''<sup>''n''−1</sup>}} and {{math|''c''<sub>''n''−2</sub>}} of {{math|''λ''<sup>''n''−2</sup>}} in the characteristic polynomial of any {{math|''n''&thinsp;×&thinsp;''n''}} matrix, respectively. So, for a {{math|3&thinsp;×&thinsp;3}} matrix {{mvar|A}}, the statement of the Cayley–Hamilton theorem can also be written as
<math display="block">A^3- (\operatorname{tr}A)A^2+\frac{1}{2}\left((\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right)A-\det(A)I_3=O,</math>
where the right-hand side designates a {{math|3&thinsp;×&thinsp;3}} matrix with all entries reduced to zero. Likewise, this determinant in the {{math|1=''n'' = 3}} case, is now
<math display="block">\begin{align}
\det(A) &= \frac{1}{3!} B_3(s_1, -1! s_2, 2! s_3) = \frac{1}{6}(s_1^3 + 3 s_1 (-s_2) + 2 s_3) \\[5pt]
 &= \frac{1}{6} \left [ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3) \right ].
\end{align}</math>
This expression gives the negative of coefficient {{math|''c''<sub>''n''−3</sub>}} of {{math|''λ''<sup>''n''−3</sup>}} in the general case, as seen below.

Similarly, one can write for a {{math|4&thinsp;×&thinsp;4}} matrix {{mvar|A}},
<math display="block"> A^4-(\operatorname{tr}A)A^3 + \tfrac{1}{2}\left[(\operatorname{tr}A)^2-\operatorname{tr}(A^2)\right]A^2 - \tfrac{1}{6}\left[ (\operatorname{tr}A)^3-3\operatorname{tr}(A^2)(\operatorname{tr}A)+2\operatorname{tr}(A^3)\right] A + \det(A)I_4 = O,</math>

where, now, the determinant is {{math|''c''<sub>''n''−4</sub>}},

<math display="block">\tfrac{1}{24}\! \left [ (\operatorname{tr}A)^4-6 \operatorname{tr}(A^2)(\operatorname{tr}A)^2 + 3\left(\operatorname{tr}(A^2)\right)^2 + 8\operatorname{tr}(A^3)\operatorname{tr}(A) -6\operatorname{tr}(A^4) \right ],</math>

and so on for larger matrices. The increasingly complex expressions for the coefficients {{math|''c''<sub>''k''</sub>}} is deducible from [[Newton's identities]] or the [[Faddeev–LeVerrier algorithm]].

===''n''-th power of matrix===
The Cayley–Hamilton theorem always provides a relationship between the powers of {{mvar|A}} (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power {{mvar|A}}<sup>''n''</sup> or any higher powers of {{mvar|A}}.

As an example, for <math>A = \begin{pmatrix}1&2\\3&4\end{pmatrix}</math> the theorem gives 
<math display="block">A^2=5A+2I_2\, .</math>

Then, to calculate {{math|''A''<sup>4</sup>}}, observe
<math display="block">\begin{align}
A^3&=(5A+2I_2)A=5A^2+2A=5(5A+2I_2)+2A=27A+10I_2, \\[1ex]
A^4&=A^3A=(27A+10I_2)A=27A^2+10A=27(5A+2I_2)+10A=145A+54I_2\, .
\end{align}</math>
Likewise,
<math display="block">\begin{align}
A^{-1} &= \frac{1}{2}\left(A-5I_2\right)~. \\[1ex]
A^{-2} &= A^{-1} A^{-1} = \frac{1}{4} \left(A^2-10A+25I_2\right) = \frac{1}{4} \left((5A+2I_2)-10A+25I_2\right) = \frac{1}{4} \left(-5A+27I_2\right)~.
\end{align}</math>

Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order {{math|''k''}} can be written as a matrix polynomial of degree at most {{math|''n'' − 1}}, where {{math|''n''}} is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.

===Matrix functions===
Given an [[analytic function]]
<math display="block">f(x) = \sum_{k=0}^\infty a_k x^k</math>
and the characteristic polynomial {{math|''p''(''x'')}} of degree {{math|''n''}} of an {{math|''n''&thinsp;×&thinsp;''n''}} matrix {{mvar|A}}, the function can be expressed using long division as 
<math display="block">f(x) = q(x) p(x) + r(x),</math>
where {{math|''q''(''x'')}} is some quotient polynomial and {{math|''r''(''x'')}} is a remainder polynomial such that {{math|0 ≤ deg ''r''(''x'') < ''n''}}.

By the Cayley–Hamilton theorem, replacing {{mvar|x}} by the matrix {{mvar|A}} gives {{math|1=''p''(''A'') = 0}}, so one has
<math display="block">f(A) = r(A). </math>

Thus, the analytic function of the matrix {{mvar|''A''}} can be expressed as a matrix polynomial of degree less than {{mvar|''n''}}.

Let the remainder polynomial be
<math display="block">r(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1}.</math>
Since {{math|1=''p''(''λ'') = 0}}, evaluating the function {{math|''f''(''x'')}} at the {{math|''n''}} eigenvalues of {{math|''A''}} yields
<math display="block"> f(\lambda_i) = r(\lambda_i) = c_0 + c_1 \lambda_i + \cdots + c_{n-1} \lambda_i^{n-1}, \qquad \text{for } i=1,2,...,n.</math>
This amounts to a system of {{math|''n''}} [[linear equation]]s, which can be solved to determine the coefficients {{math|''c<sub>i</sub>''}}. Thus, one has 
<math display="block">f(A) = \sum_{k=0}^{n-1} c_k A^k.</math>

When the eigenvalues are repeated, that is {{math|1=''λ<sub>i</sub> = λ<sub>j</sub>''}} for some {{math|''i ≠ j''}}, two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue {{math|''λ''}} with multiplicity {{math|''m''}}, the first {{math|''m'' – 1}} derivatives of {{math|''p''(''x'')}} vanish at the eigenvalue. This leads to the extra {{math|''m'' – 1}} linearly independent solutions 
<math display="block">\left.\frac{\mathrm{d}^k f(x)}{\mathrm{d}x^k}\right|_{x=\lambda} = \left.\frac{\mathrm{d}^k r(x)}{\mathrm{d}x^k}\right|_{x=\lambda}\qquad \text{for } k = 1, 2, \ldots, m-1,</math>
which, combined with others, yield the required {{math|''n''}} equations to solve for {{math|''c<sub>i</sub>''}}.

Finding a polynomial that passes through the points {{math|(''λ<sub>i</sub>'', &thinsp;''f''&thinsp;(''λ<sub>i</sub>''))}} is essentially an [[polynomial interpolation|interpolation problem]], and can be solved using [[Lagrange interpolation|Lagrange]] or [[Newton polynomial|Newton interpolation]] techniques, leading to [[Sylvester's formula]].

For example, suppose the task is to find the polynomial representation of 
<math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}1&2\\0&3\end{pmatrix}.</math>

The characteristic polynomial is {{math|1=''p''(''x'') = (''x'' − 1)(''x'' − 3) = ''x''<sup>2</sup> − 4''x'' + 3}}, and the eigenvalues are {{math|1=''λ'' = 1, 3}}. Let {{math|1=''r''(''x'') = ''c''<sub>0</sub> + ''c''<sub>1</sub>''x''}}. Evaluating {{math|1=''f''(''λ'') = ''r''(''λ'')}} at the eigenvalues, one obtains two linear equations, {{math|1=''e''<sup>''t''</sup> = ''c''<sub>0</sub> + ''c''<sub>1</sub>}} and {{math|1=''e''<sup>3''t''</sup> = ''c''<sub>0</sub> + 3''c''<sub>1</sub>}}.

Solving the equations yields {{math|1=''c''<sub>0</sub> = (3''e''<sup>''t''</sup> − ''e''<sup>3''t''</sup>)/2}} and {{math|1=''c''<sub>1</sub> = (''e''<sup>3''t''</sup> − ''e''<sup>''t''</sup>)/2}}. Thus, it follows that
<math display="block">e^{At} = c_0 I_2 + c_1 A = \begin{pmatrix}c_0 + c_1 & 2 c_1\\ 0 & c_0 + 3 c_1\end{pmatrix} = \begin{pmatrix}e^{t} & e^{3t} - e^{t} \\ 0 & e^{3t}\end{pmatrix}. </math>

If, instead, the function were {{math|1=''f''(''A'') = sin ''At''}}, then the coefficients would have been {{math|1=''c''<sub>0</sub> = (3 sin ''t'' − sin 3''t'')/2}} and {{math|1=''c''<sub>1</sub> = (sin 3''t'' − sin ''t'')/2}}; hence
<math display="block">\sin(At) = c_0 I_2 + c_1 A = \begin{pmatrix}\sin t & \sin 3t - \sin t \\ 0 & \sin 3t\end{pmatrix}.</math>

As a further example, when considering
<math display="block">f(A) = e^{At} \qquad \mathrm{where} \qquad A = \begin{pmatrix}0 & 1\\-1 & 0\end{pmatrix},</math>
then the characteristic polynomial is {{math|1=''p''(''x'') = ''x''<sup>2</sup>&nbsp;+&thinsp;1}}, and the eigenvalues are {{math|1=''λ'' = ±''i''}}.

As before, evaluating the function at the eigenvalues gives us the linear equations {{math|1=''e<sup>it</sup> = c<sub>0</sub> + i c<sub>1</sub>''}} and {{math|1=''e''<sup>−''it''</sup> = ''c''<sub>0</sub> − ''ic''<sub>1</sub>}}; the solution of which gives, {{math|1=''c''<sub>0</sub> = (''e''<sup>''it''</sup> + ''e''<sup>−''it''</sup>)/2 = cos&thinsp;''t''}} and {{math|1=''c''<sub>1</sub> = (''e''<sup>''it''</sup> − ''e''<sup>−''it''</sup>)/2''i'' = sin&thinsp;''t''}}. Thus, for this case,
<math display="block">e^{At} = (\cos t) I_2 + (\sin t) A = \begin{pmatrix}\cos t & \sin t\\ -\sin t & \cos t \end{pmatrix},</math>
which is a [[rotation matrix]].

Standard examples of such usage is the [[exponential map (Lie theory)|exponential map]] from the [[Lie algebra]] of a [[matrix Lie group]] into the group. It is given by a [[matrix exponential]],
<math display="block">\exp: \mathfrak g \rightarrow G;
\qquad tX \mapsto e^{tX} = \sum_{n=0}^\infty \frac{t^nX^n}{n!} = I + tX + \frac{t^2X^2}{2} + \cdots, t \in \mathbb R, X \in \mathfrak g .</math>
Such expressions have long been known for {{math|SU(2)}},
<math display="block">e^{i(\theta/2)(\hat\mathbf n \cdot \sigma)} = I_2 \cos \frac \theta 2 + i (\hat\mathbf n \cdot \sigma) \sin \frac \theta 2,</math>
where the {{mvar|σ}} are the [[Pauli matrices]] and for {{math|SO(3)}},
<math display="block">e^{i\theta(\hat\mathbf n \cdot \mathbf J)} = I_3 + i(\hat\mathbf n \cdot \mathbf J) \sin \theta + (\hat\mathbf n \cdot \mathbf J)^2 (\cos \theta - 1),</math>
which is [[Rodrigues' rotation formula]]. For the notation, see [[3D rotation group#A note on Lie algebras]].

More recently, expressions have appeared for other groups, like the [[Lorentz group]] {{math|SO(3, 1)}},<ref>{{harvnb|Zeni|Rodrigues|1992}}</ref> {{math|O(4, 2)}}<ref>{{harvnb|Barut|Zeni|Laufer|1994a}}</ref> and {{math|SU(2, 2)}},<ref>{{harvnb|Barut|Zeni|Laufer|1994b}}</ref> as well as {{math|GL(''n'', '''R''')}}.<ref>{{harvnb|Laufer|1997}}</ref> The group {{math|O(4, 2)}} is the [[conformal group]] of [[spacetime]], {{math|SU(2, 2)}} its [[simply connected]] cover (to be precise, the simply connected cover of the [[connected component (topology)|connected component]] {{math|SO<sup>+</sup>(4, 2)}} of {{math|O(4, 2)}}). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the [[eigenvalue]]s of the matrix to exponentiate. For {{math|SU(2)}} (and hence for {{math|SO(3)}}), closed expressions have been obtained for ''all'' irreducible representations, i.e. of any spin.<ref>{{harvnb|Curtright|Fairlie|Zachos|2014}}</ref>

[[File:GeorgFrobenius.jpg|220px|thumb|right|[[Ferdinand Georg Frobenius]] (1849–1917), German mathematician. His main interests were [[elliptic function]]s, [[differential equation]]s, and later [[group theory]].<br/>In 1878 he gave the first full proof of the Cayley&ndash;Hamilton theorem.<ref name="Frobenius 1878"/>]]

===Algebraic number theory===
The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of [[algebraic integer]]s. For example, given a finite extension <math>\mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> of <math>\mathbb{Q}</math> and an algebraic integer <math>\alpha \in \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math> which is a non-zero linear combination of the <math>\alpha_1^{n_1}\cdots\alpha_k^{n_k}</math> we can compute the minimal polynomial of <math>\alpha</math> by finding a matrix representing the <math>\mathbb{Q}</math>-[[linear transformation]]
<math display="block">\cdot \alpha : \mathbb{Q}[\alpha_1,\ldots,\alpha_k] \to \mathbb{Q}[\alpha_1,\ldots,\alpha_k]</math>
If we call this transformation matrix <math>A</math>, then we can find the minimal polynomial by applying the Cayley–Hamilton theorem to <math>A</math>.<ref>{{cite book|last1=Stein|first1=William|title=Algebraic Number Theory, a Computational Approach | page=29|url=http://wstein.org/books/ant/ant.pdf}}</ref>