Editing Cayley–Hamilton theorem (section)

=== Preliminaries ===
If a vector {{math|''v''}} of size {{math|''n''}} is an [[eigenvector]] of {{math|''A''}} with eigenvalue {{math|''λ''}}, in other words if {{math|1=''A''⋅''v'' = ''λv''}}, then
<math display="block">\begin{align}
p(A)\cdot v & = A^n\cdot v+c_{n-1}A^{n-1}\cdot v+\cdots+c_1A\cdot v+c_0I_n\cdot v \\[6pt]
& = \lambda^nv+c_{n-1}\lambda^{n-1}v+\cdots+c_1\lambda v+c_0 v=p(\lambda)v,
\end{align}</math>
which is the zero vector since {{math|1=''p''(''λ'') = 0}} (the eigenvalues of {{math|''A''}} are precisely the [[root of a function|root]]s of {{math|''p''(''t'')}}). This holds for all possible eigenvalues {{math|''λ''}}, so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if {{math|''A''}} admits a [[basis (linear algebra)|basis]] of eigenvectors, in other words if {{math|''A''}} is [[diagonalizable]], then the Cayley–Hamilton theorem must hold for {{math|''A''}}, since two matrices that give the same values when applied to each element of a basis must be equal. 
<math display="block">A=XDX^{-1}, \quad D=\operatorname{diag}(\lambda_i), \quad i=1,2,...,n</math>
<math display="block">p_A(\lambda)=|\lambda I-A|=\prod_{i=1}^n (\lambda-\lambda_i)\equiv \sum_{k=0}^n c_k\lambda^k</math>
<math display="block">p_A(A)=\sum c_k A^k=X p_A(D)X^{-1}=X C X^{-1} </math>
<math display="block">C_{ii}=\sum_{k=0}^n c_k\lambda_i^k=\prod_{j=1}^n(\lambda_i-\lambda_j)=0, \qquad C_{i,j\neq i}=0</math>
<math display="block">\therefore p_A(A)=XCX^{-1}=O .</math>

Consider now the function <math>e\colon M_n \to M_n</math> which maps {{math|''n''&thinsp;×&thinsp;''n''}} matrices to {{math|''n''&thinsp;×&thinsp;''n''}} matrices given by the formula <math>e(A)=p_A(A)</math>, i.e. which takes a matrix <math>A</math> and plugs it into its own characteristic polynomial. Not all matrices are diagonalizable, but for matrices with [[complex number|complex]] coefficients many of them are: the set <math>D</math> of diagonalizable complex square matrices of a given size is [[dense set|dense]] in the set of all such square matrices<ref>{{harvnb|Bhatia|1997|p=7}}</ref> (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have any [[multiple root]]s). Now viewed as a function <math>e\colon \C^{n^2}\to \C ^{n^2}</math>(since matrices have <math>n^2</math> entries) we see that this function is [[Continuous function|continuous]]. This is true because the entries of the image of a matrix are given by polynomials in the entries of the matrix. Since
<math display="block">e(D) = \left\{\begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}\right\}</math>

and since the set <math>D</math> is dense, by continuity this function must map the entire set of {{math|''n''&thinsp;×&thinsp;''n''}} matrices to the zero matrix. Therefore, the Cayley–Hamilton theorem is true for complex numbers, and must therefore also hold for <math>\Q</math>- or <math>\R</math>-valued matrices.

While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any [[commutative ring]].

There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of [[abstract algebra]]ic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.

==== Adjugate matrices ====
All proofs below use the notion of the [[adjugate matrix]] {{math|adj(''M'')}} of an {{math|''n''&thinsp;×&thinsp;''n''}} matrix {{math|''M''}}, the [[transpose]] of its [[Minor (linear algebra)|cofactor matrix]]. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of {{math|''M''}} (in fact, by certain {{math|(''n'' − 1)&thinsp;×&thinsp;(''n'' − 1)}} determinants), in such a way that the following fundamental relations hold,
<math display="block">\operatorname{adj}(M)\cdot M=\det(M)I_n=M\cdot\operatorname{adj}(M)~.</math>
These relations are a direct consequence of the basic properties of determinants: evaluation of the {{math|(''i'', ''j'')}} entry of the matrix product on the left gives the expansion by column {{math|''j''}} of the determinant of the matrix obtained from {{math|''M''}} by replacing column {{math|''i''}} by a copy of column {{math|''j''}}, which is {{math|det(''M'')}} if {{math|''i'' {{=}} ''j''}} and zero otherwise; the matrix product on the right is similar, but for expansions by rows.

Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.