Editing Smith normal form (section)

== Algorithm ==
The first goal is to find invertible square matrices <math>S</math> and <math>T</math> such that the product <math>S A T</math> is diagonal.  This is the hardest part of the algorithm. Once diagonality is achieved, it becomes relatively easy to put the matrix into Smith normal form. Phrased more abstractly, the goal is to show that, thinking of <math>A</math> as a map from <math>R^n</math> (the free <math>R</math>-module of rank <math>n</math>) to <math>R^m</math> (the free <math>R</math>-module of rank <math>m</math>), there are [[isomorphism]]s <math>S:R^m \to R^m</math> and <math>T:R^n \to R^n</math> such that <math>S \cdot A \cdot T</math> has the simple form of a diagonal matrix.  The matrices <math>S</math> and <math>T</math> can be found by starting out with identity matrices of the appropriate size, and modifying <math>S</math> each time a row operation is performed on <math>A</math> in the algorithm by the corresponding column operation (for example, if row <math>i</math> is added to row <math>j</math> of <math>A</math>, then column <math>j</math> should be subtracted from column <math>i</math> of <math>S</math> to retain the product invariant), and similarly modifying <math>T</math> for each column operation performed. Since row operations are left-multiplications and column operations are right-multiplications, this preserves the invariant <math>A'=S'\cdot A\cdot T'</math> where <math>A',S',T'</math> denote current values and <math>A</math> denotes the original matrix; eventually the matrices in this invariant become diagonal. Only invertible row and column operations are performed, which ensures that <math>S</math> and <math>T</math> remain invertible matrices.

For <math>a \in R\setminus \{0\}</math>, write <math>\delta(a)</math> for the number of prime factors of <math>a</math> (these exist and are unique since any PID is also a [[unique factorization domain]]). In particular, <math>R</math> is also a [[Bézout domain]], so it is a [[gcd domain]] and the gcd of any two elements satisfies a [[Bézout's identity]].

To put a matrix into Smith normal form, one can repeatedly apply the following, where <math>t</math> loops from 1 to <math>m</math>.

===Step I: Choosing a pivot===
Choose <math>j_t</math> to be the smallest column index of <math>A</math> with a non-zero entry, starting the search at column index <math>j_{t-1}+1</math> if <math>t> 1</math>.

We wish to have <math>a_{t,j_t}\neq0</math>; if this is the case this step is complete, otherwise there is by assumption some <math>k</math> with <math>a_{k,j_t} \neq 0</math>, and we can exchange rows <math>t</math> and <math>k</math>, thereby obtaining <math>a_{t,j_t}\neq0</math>.

Our chosen pivot is now at position <math>(t, j_t)</math>.

===Step II: Improving the pivot===
If there is an entry at position (''k'',''j''<sub>''t''</sub>) such that <math>a_{t,j_t} \nmid a_{k,j_t}</math>, then, letting <math>\beta =\gcd\left(a_{t,j_t}, a_{k,j_t}\right)</math>, we know by the Bézout property that there exist σ, τ in ''R'' such that

:<math>
a_{t,j_t} \cdot \sigma + a_{k,j_t} \cdot \tau=\beta.
</math>

By left-multiplication with an appropriate invertible matrix ''L'', it can be achieved that row ''t'' of the matrix product is the sum of σ times the original row ''t'' and τ times the original row ''k'', that row ''k'' of the product is another [[linear combination]] of those original rows, and that all other rows are unchanged. Explicitly, if σ and τ satisfy the above equation, then for <math>\alpha=a_{t,j_t}/\beta</math> and <math>\gamma=a_{k,j_t}/\beta</math> (which divisions are possible by the definition of β) one has

:<math>
\sigma\cdot \alpha + \tau \cdot \gamma=1,
</math>

so that the matrix

:<math> L_0=
\begin{pmatrix}
\sigma & \tau \\
-\gamma & \alpha \\
\end{pmatrix}
</math>

is invertible, with inverse

:<math>
\begin{pmatrix}
\alpha & -\tau \\
\gamma & \sigma \\
\end{pmatrix}
.</math>

Now ''L'' can be obtained by fitting <math>L_0</math> into rows and columns ''t'' and ''k'' of the [[identity matrix]]. By construction the matrix obtained after left-multiplying by ''L'' has entry β at position (''t'',''j''<sub>''t''</sub>) (and due to our choice of α and γ it also has an entry 0 at position (''k'',''j''<sub>''t''</sub>), which is useful though not essential for the algorithm). This new entry β divides the entry <math>a_{t,j_t}</math> that was there before, and so in particular <math>\delta(\beta) < \delta(a_{t,j_t})</math>; therefore repeating these steps must eventually terminate. One ends up with a matrix having an entry at position (''t'',''j''<sub>''t''</sub>) that divides all entries in column ''j''<sub>''t''</sub>.

===Step III: Eliminating entries===
Finally, adding appropriate multiples of row ''t'', it can be achieved that all entries in column ''j''<sub>''t''</sub> except for that at position (''t'',''j''<sub>''t''</sub>) are zero. This can be achieved by left-multiplication with an appropriate matrix.  However, to make the matrix fully diagonal we need to eliminate nonzero entries on the row of position (''t'',''j''<sub>''t''</sub>) as well.  This can be achieved by repeating the steps in Step II for columns instead of rows, and using multiplication on the right by the [[transpose]] of the obtained matrix ''L''.  In general this will result in the zero entries from the prior application of Step III becoming nonzero again.

However, notice that each application of Step II for either rows or columns must continue to reduce the value of <math>\delta(a_{t,j_t})</math>, and so the process must eventually stop after some number of iterations, leading to a matrix where the entry at position (''t'',''j''<sub>''t''</sub>) is the only non-zero entry in both its row and column.

At this point, only the block of ''A'' to the lower right of (''t'',''j''<sub>''t''</sub>) needs to be diagonalized, and conceptually the algorithm can be applied recursively, treating this block as a separate matrix. In other words, we can increment ''t'' by one and go back to Step I.

===Final step===
Applying the steps described above to the remaining non-zero columns of the resulting matrix (if any), we get an <math>m \times n</math>-matrix with column indices <math>j_1 < \ldots < j_r</math> where <math>r \le \min(m,n)</math>. The matrix entries <math>(l,j_l)</math> are non-zero, and every other entry is zero.

Now we can move the null columns of this matrix to the right, so that the nonzero entries are on positions <math>(i,i)</math> for <math>1 \le i\le r</math>. For short, set <math>\alpha_i</math> for the element at position <math>(i,i)</math>.

The condition of divisibility of diagonal entries might not be satisfied. For any index <math>i<r</math> for which <math>\alpha_i\nmid\alpha_{i+1}</math>, one can repair this shortcoming by operations on rows and columns <math>i</math> and <math>i+1</math> only: first add column <math>i+1</math> to column <math>i</math> to get an entry <math>\alpha_{i+1}</math> in column ''i'' without disturbing the entry <math>\alpha_i</math> at position <math>(i,i)</math>, and then apply a row operation to make the entry at position <math>(i,i)</math> equal to <math>\beta=\gcd(\alpha_i,\alpha_{i+1})</math> as in Step&nbsp;II; finally proceed as in Step&nbsp;III to make the matrix diagonal again. Since the new entry at position <math>(i+1,i+1)</math> is a linear combination of the original <math>\alpha_i,\alpha_{i+1}</math>, it is divisible by β.

The value <math>\delta(\alpha_1)+\cdots+\delta(\alpha_r)</math> does not change by the above operation (it is δ of the determinant of the upper <math>r\times r</math> submatrix), whence that operation does diminish (by moving prime factors to the right) the value of
:<math>\sum_{j=1}^r(r-j)\delta(\alpha_j).</math>
So after finitely many applications of this operation no further application is possible, which means that we have obtained <math>\alpha_1\mid\alpha_2\mid\cdots\mid\alpha_r</math> as desired.

Since all row and column manipulations involved in the process are invertible, this shows that there exist invertible <math>m \times m</math> and <math>n \times n</math>-matrices ''S, T'' so that the product ''S A T'' satisfies the definition of a Smith normal form. In particular, this shows that the Smith normal form exists, which was assumed without proof in the definition.