Editing Companion matrix

{{Short description|Square matrix constructed from a monic polynomial}}
In [[linear algebra]], the  [[Ferdinand Georg Frobenius|Frobenius]]  '''companion matrix''' of the [[monic polynomial]]
<math display="block">
p(x)=c_0 + c_1 x + \cdots + c_{n-1}x^{n-1} + x^n
</math>
is the [[square matrix]] defined as

<math display="block">C(p)=\begin{bmatrix}
0 & 0 & \dots & 0 & -c_0 \\
1 & 0 & \dots & 0 & -c_1 \\
0 & 1 & \dots & 0 & -c_2 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & \dots & 1 & -c_{n-1}
\end{bmatrix}.</math>

Some authors use the [[transpose]] of this matrix, <math> C(p)^T </math>, which is more convenient for some purposes such as linear [[recurrence relation]]s ([[#Linear recursive sequences|see below]]).

<math> C(p) </math> is defined from the coefficients of <math>p(x)</math>, while the [[characteristic polynomial]] as well as the [[minimal polynomial (linear algebra)|minimal polynomial]] of <math> C(p) </math> are equal to <math>p(x) </math>.<ref> {{Cite book |last=Horn |first=Roger A. |url=https://books.google.com/books?id=f6_r93Of544C&dq=%22companion+matrix%22&pg=PA147 |title=Matrix Analysis |author2=Charles R. Johnson |publisher=Cambridge University Press |year=1985 |isbn=0-521-30586-1 |location=Cambridge, UK |pages=146–147 |accessdate=2010-02-10}}</ref> In this sense, the matrix <math> C(p) </math> and the polynomial <math>p(x)</math> are "companions".

==Similarity to companion matrix==
Any matrix {{mvar|A}} with entries in a [[field (mathematics)|field]] {{mvar|F}} has characteristic polynomial <math> p(x) = \det(xI - A) </math>, which in turn has companion matrix <math> C(p) </math>. These matrices are related as follows.

The following statements are equivalent:
* ''A''  is [[similar (linear algebra)|similar]] over ''F'' to <math> C(p) </math>, i.e. ''A'' can be conjugated to its companion matrix by matrices in GL''<sub>n</sub>''(''F'');
* the characteristic polynomial <math> p(x) </math> coincides with the minimal polynomial of ''A'' , i.e. the minimal polynomial has degree ''n'';
* the linear mapping <math>A:F^n\to F^n</math> makes <math>F^n</math> a [[Cyclic module|cyclic]] <math>F[A]</math>-module, having a basis of the form <math>\{v,Av,\ldots,A^{n-1}v\} </math>; or equivalently <math>F^n \cong F[X]/(p(x))</math> as <math>F[A]</math>-modules.

If the above hold, one says that ''A'' is ''non-derogatory''.

Not every square matrix is similar to a companion matrix, but every square matrix is similar to a [[block diagonal]] matrix made of companion matrices. If we also demand that the polynomial of each diagonal block divides the next one, they are uniquely determined by ''A'', and this gives the [[Frobenius normal form|rational canonical form]] of ''A''. 

==Diagonalizability==
The roots of the characteristic polynomial <math>p(x)</math> are the [[eigenvalue]]s of <math> C(p) </math>. If there are ''n'' distinct eigenvalues <math>\lambda_1,\ldots,\lambda_n</math>, then <math> C(p) </math> is [[diagonalizable]] as <math> C(p)  = V^{-1}\!D V</math>, where ''D'' is the diagonal matrix and ''V'' is the [[Vandermonde matrix]] corresponding to the {{mvar|λ}}'s: 
<math display="block"> D = \begin{bmatrix}
\lambda_1 & 0 & \!\!\!\cdots\!\!\! & 0\\
0 & \lambda_2 & \!\!\!\cdots\!\!\! & 0\\
0 & 0 & \!\!\!\cdots\!\!\! & \lambda_n
\end{bmatrix}, 
\qquad 
V = \begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \!\!\!\cdots\!\!\! & \lambda_1^{n-1}\\
1 & \lambda_2 & \lambda_2^2 & \!\!\!\cdots\!\!\! & \lambda_2^{n-1}\\[-1em]
\vdots & \vdots & \vdots & \!\!\!\ddots\!\!\! &\vdots \\
1 & \lambda_n & \lambda_n^2 & \!\!\!\cdots\!\!\! & \lambda_n^{n-1}
\end{bmatrix}.
</math>
Indeed, a reasonably hard computation shows that the transpose <math> C(p)^T </math> has eigenvectors <math> v_i=(1,\lambda_i,\ldots,\lambda_i ^{n-1}) </math> with <math> C(p)^T\!(v_i) = \lambda_i v_i </math>, which follows from <math> p(\lambda_i)=c_0+c_1\lambda_i+\cdots+c_{n-1}\lambda_i^{n-1}+\lambda_i  ^n = 0 </math>. Thus, its diagonalizing [[change of basis]] matrix is <math> V^T = [v_1^T \ldots v_n^T] </math>, meaning <math> C(p)^T = V^T D\, (V^T) ^{-1} </math>, and taking the transpose of both sides gives <math> C(p)  = V^{-1}\!D V</math>.

We can read the eigenvectors of <math> C(p) </math> with <math> C(p)(w_i) = \lambda_i w_i </math> from the equation <math> C(p)  = V^{-1}\!D V</math>: they are the column vectors of the [[Vandermonde matrix#Inverse Vandermonde Matrix|inverse Vandermonde matrix]] <math> V^{-1} = [w_1^T \cdots w_n^T] </math>. This matrix is known explicitly, giving the eigenvectors <math> w_i = (L_{0i}, \ldots, L_{(n-1)i}) </math>, with coordinates equal to the coefficients of the [[Lagrange polynomial|Lagrange polynomials]]
<math display="block">
L_i(x) = L_{0i} + L_{1i}x + \cdots + L_{(n-1)i}x^{n-1} = \prod_{j\neq i} \frac{x-\lambda_j}{\lambda_j-\lambda_i} 
= \frac{p(x)}{(x-\lambda_i)\,p'(\lambda_i)}. 
</math>
Alternatively, the scaled eigenvectors <math> \tilde w_i = p'\!(\lambda_i)\,w_i </math> have simpler coefficients.

If <math> p(x) </math> has multiple roots, then <math> C(p) </math> is not diagonalizable. Rather, the [[Jordan canonical form]] of <math> C(p) </math> contains one [[Jordan block]] for each distinct root; if the multiplicity of the root is ''m'', then the block is an ''m'' × ''m'' matrix with  <math> \lambda </math> on the diagonal and 1 in the entries just above the diagonal. in this case, ''V'' becomes a [[Vandermonde matrix#Confluent Vandermonde matrices|confluent Vandermonde matrix]].<ref> {{cite book
	| last1 = Turnbull
	| first1 =  H. W.
	| last2 = Aitken
	| first2 = A. C.
	| date = 1961
	| title = An Introduction to the Theory of Canonical Matrices
	| location =  New York
	| publisher = Dover
	| page = 60
	| isbn = 978-0486441689
}}
</ref>

==Linear recursive sequences==
A [[linear recursive sequence]] defined by <math>a_{k+n} = - c_0 a_k - c_1 a_{k+1} \cdots - c_{n-1} a_{k+n-1}</math> for <math>k \geq 0</math> has the characteristic polynomial <math>p(x)=c_0 + c_1 x + \cdots + c_{n-1}x^{n-1} + x^n  </math>, whose transpose companion matrix <math> C(p)^T </math> generates the sequence:
<math display="block">\begin{bmatrix}a_{k+1}\\
a_{k+2}\\
\vdots \\
a_{k+n-1}\\
a_{k+n}
\end{bmatrix}
=
\begin{bmatrix}
0 & 1 & 0 & \cdots & 0\\
0 & 0 & 1 & \cdots & 0\\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 1\\
-c_0 & -c_1 & -c_2 & \cdots & -c_{n-1}
\end{bmatrix}
\begin{bmatrix}a_k\\
a_{k+1}\\
\vdots \\
a_{k+n-2}\\
a_{k+n-1}
\end{bmatrix}
.</math>
The vector <math>v=(1,\lambda,\lambda^2,\ldots,\lambda^{n-1})</math> is an eigenvector of this matrix, where the eigenvalue <math>\lambda</math> is a root of <math>p(x)</math>. Setting the initial values of the sequence equal to this vector produces a geometric sequence <math>a_k = \lambda^k</math> which satisfies the recurrence. In the case of ''n'' distinct eigenvalues, an arbitrary solution <math>a_k </math> can be written as a linear combination of such geometric solutions, and the eigenvalues of largest complex norm give an [[asymptotic approximation]].
==From linear ODE to first-order linear ODE system==

Similarly to the above case of linear recursions, consider a homogeneous [[linear ODE]] of order ''n'' for the scalar function <math>y = y(t)</math>:
<math display="block">
y^{(n)} + c_{n-1}y^{(n-1)} + \dots + c_{1}y^{(1)} + c_0 y = 0 .
</math>
This can be equivalently described as a coupled system of homogeneous linear ODE of order 1 for the vector function <math>z(t) = (y(t),y'(t),\ldots,y^{(n-1)}(t))</math>:
<math display="block">z' = C(p)^T z</math>
where <math>C(p)^T</math> is the transpose companion matrix for the characteristic polynomial
<math display="block">p(x)=x^n + c_{n-1}x^{n-1}  + \cdots +  c_1 x + c_0 .</math>
Here the coefficients <math>c_i = c_i(t)</math> may be also functions, not just constants. 

If <math>C(p)^T</math> is diagonalizable, then a diagonalizing change of basis will transform this into a decoupled system equivalent to one scalar homogeneous first-order linear ODE in each coordinate.

An inhomogeneous equation
<math display="block">
y^{(n)} + c_{n-1}y^{(n-1)} + \dots + c_{1}y^{(1)} + c_0 y = f(t)
</math>
is equivalent to the system:
<math display="block">
z' = C(p)^T z + F(t)
</math>
with the inhomogeneity term <math>F(t) = (0,\ldots,0,f(t)) </math>.

Again, a diagonalizing change of basis will transform this into a decoupled system of scalar inhomogeneous first-order linear ODEs.

== Cyclic shift matrix ==
In the case of <math>p(x)=x^n-1</math>, when the eigenvalues are the complex [[Root of unity|roots of unity]], the companion matrix and its transpose both reduce to Sylvester's cyclic [[Generalizations of Pauli matrices#Construction: The clock and shift matrices|shift matrix]], a [[circulant matrix]].

== Multiplication map on a simple field extension ==
Consider a polynomial <math>p(x)=x^n + c_{n-1}x^{n-1}  + \cdots +  c_1 x + c_0 </math> with coefficients in a [[Field (mathematics)|field]] <math>F</math>, and suppose <math>p(x)</math> is [[Irreducible polynomial|irreducible]] in the [[polynomial ring]] <math>F[x]</math>. Then [[Simple extension|adjoining a root]] <math>\lambda</math> of <math>p(x)</math> produces a [[field extension]] <math>K=F(\lambda) \cong F[x]/(p(x))</math>, which is also a vector space over <math>F</math> with standard basis <math>\{1,\lambda,\lambda^2,\ldots,\lambda^{n-1}\} </math>. Then the <math>F</math>-linear multiplication mapping
{{ block indent | em = 1.5 | text = <math>m_{\lambda}:K\to K</math>  defined by  <math>m_\lambda(\alpha) = \lambda\alpha</math> }}
has an ''n'' × ''n'' matrix <math>[m_\lambda]</math> with respect to the standard basis. Since <math>m_\lambda(\lambda^i) = \lambda^{i+1}</math> and <math>m_\lambda(\lambda^{n-1}) = \lambda^n = -c_0-\cdots-c_{n-1}\lambda^{n-1}</math>, this is the companion matrix of <math>p(x)</math>:
<math display="block">[m_\lambda] = C(p).</math>
Assuming this extension is [[Separable extension|separable]] (for example if <math>F</math> has [[characteristic zero]] or is a [[finite field]]), <math>p(x)</math> has distinct roots <math>\lambda_1,\ldots,\lambda_n </math> with <math>\lambda_1  = \lambda</math>, so that
<math display="block">p(x)=(x-\lambda_1)\cdots (x-\lambda_n),</math>
and it has [[splitting field]] <math>L = F(\lambda_1,\ldots,\lambda_n)</math>. Now <math>m_\lambda</math> is not diagonalizable over <math>F</math>; rather, we must [[Extension of scalars|extend]] it to an <math>L</math>-linear map on <math>L^n \cong L\otimes_F K</math>, a vector space over <math>L</math> with standard basis <math>\{1{\otimes} 1, \, 1{\otimes}\lambda, \, 1{\otimes}\lambda^2,\ldots,1{\otimes}\lambda^{n-1}\}  </math>, containing vectors <math>w = (\beta_1,\ldots,\beta_n) = \beta_1{\otimes} 1+\cdots+\beta_n{\otimes} \lambda^{n-1} </math>. The extended mapping is defined by <math>m_\lambda(\beta\otimes\alpha) = \beta\otimes(\lambda\alpha)</math>. 

The matrix <math>[m_\lambda] = C(p)</math> is unchanged, but as above, it can be diagonalized by matrices with entries in <math>L</math>: 
<math display="block">[m_\lambda]=C(p)= V^{-1}\! D V,</math>
for the diagonal matrix <math>D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n)</math> and the [[Vandermonde matrix]] ''V'' corresponding to <math>\lambda_1,\ldots,\lambda_n\in L </math>. The explicit formula for the eigenvectors (the scaled column vectors of the [[Vandermonde matrix#Inverse Vandermonde matrix|inverse Vandermonde matrix]] <math>V^{-1}</math>) can be written as:
<math display="block">\tilde w_i = 
\beta_{0i} {\otimes} 1+\beta_{1i} {\otimes} \lambda  +\cdots + \beta_{(n-1)i} {\otimes} \lambda^{n-1}
= \prod_{j\neq i} (1{\otimes}\lambda - \lambda_j{\otimes} 1) </math>
where <math>\beta_{ij}\in L </math> are the coefficients of the scaled Lagrange polynomial
<math display="block">\frac{p(x)}{x-\lambda_i}
= \prod_{j\neq i} (x - \lambda_j) = \beta_{0i} + \beta_{1i} x + \cdots + \beta_{(n-1)i} x^{n-1}.</math>

==See also==
*[[Frobenius endomorphism]]
*[[Cayley–Hamilton theorem]]
*[[Krylov subspace]]

==Notes==
{{reflist}}

{{Matrix classes}}

{{DEFAULTSORT:Companion Matrix}}
[[Category:Matrices (mathematics)]]
[[Category:Matrix theory]]
[[Category:Matrix normal forms]]