Editing Strassen algorithm (section)

== Algorithm ==
[[Image:Strassen algorithm.svg|thumb|800px|centre|The left column visualizes the calculations necessary to determine the result of a 2x2 [[matrix multiplication]]. Naïve matrix multiplication requires one multiplication for each "1" of the left column. Each of the other columns (M1-M7) represents a single one of the 7 multiplications in the Strassen algorithm. The sum of the columns M1-M7 gives the same result as the full matrix multiplication on the left.<!-- Feel free to rewrite this description so it actually makes sense. -->]]

Let <math>A</math>, <math>B</math> be two [[square matrix|square matrices]] over a [[ring (mathematics)|ring]] <math>\mathcal{R}</math>, for example matrices whose entries are integers or the real numbers. The goal of matrix multiplication is to calculate the matrix product <math>C = AB</math>. The following exposition of the algorithm assumes that all of these matrices have sizes that are powers of two (i.e., <math>A, \, B, \, C \in \operatorname{Matr}_{2^n \times 2^n} (\mathcal{R})</math>), but this is only conceptually necessary — if the matrices <math>A</math>, <math>B</math> are not of type <math>2^n \times 2^n</math>, the "missing" rows and columns can be filled with zeros to obtain matrices with sizes of powers of two — though real implementations of the algorithm do not do this in practice.

The Strassen algorithm partitions <math>A</math>, <math>B</math> and <math>C</math> into equally sized [[block matrix|block matrices]]
:<math> 
A =
\begin{bmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22}
\end{bmatrix}, \quad
B =
\begin{bmatrix}
B_{11} & B_{12} \\
B_{21} & B_{22}
\end{bmatrix}, \quad
C =
\begin{bmatrix}
C_{11} & C_{12} \\
C_{21} & C_{22}
\end{bmatrix}, \quad
</math>

with <math>A_{ij}, B_{ij}, C_{ij} \in \operatorname{Mat}_{2^{n-1} \times 2^{n-1}} (\mathcal{R})</math>. The naive algorithm would be:

: <math>
\begin{bmatrix}
C_{11} & C_{12} \\
C_{21} & C_{22}
\end{bmatrix}
=
\begin{bmatrix}
A_{11} {\color{red}\times} B_{11} + A_{12} {\color{red}\times} B_{21} \quad &
A_{11} {\color{red}\times} B_{12} + A_{12} {\color{red}\times} B_{22} \\
A_{21} {\color{red}\times} B_{11} + A_{22} {\color{red}\times} B_{21} \quad &
A_{21} {\color{red}\times} B_{12} + A_{22} {\color{red}\times} B_{22}
\end{bmatrix}.
</math>

This construction does not reduce the number of multiplications: 8 multiplications of matrix blocks are still needed to calculate the <math>C_{ij}</math> matrices, the same number of multiplications needed when using standard matrix multiplication.

The Strassen algorithm defines instead new values:

: <math>\begin{align}
M_1 &= (A_{11} + A_{22}) {\color{red}\times} (B_{11} + B_{22}); \\
M_2 &= (A_{21} + A_{22}) {\color{red}\times} B_{11}; \\
M_3 &= A_{11} {\color{red}\times} (B_{12} - B_{22}); \\
M_4 &= A_{22} {\color{red}\times} (B_{21} - B_{11}); \\
M_5 &= (A_{11} + A_{12}) {\color{red}\times} B_{22}; \\
M_6 &= (A_{21} - A_{11}) {\color{red}\times} (B_{11} + B_{12}); \\
M_7 &= (A_{12} - A_{22}) {\color{red}\times} (B_{21} + B_{22}), \\
\end{align}</math>

using only 7 multiplications (one for each <math>M_k</math>) instead of 8. We may now express the <math>C_{ij}</math> in terms of <math>M_k</math>:

: <math>
\begin{bmatrix}
C_{11} & C_{12} \\
C_{21} & C_{22}
\end{bmatrix}
=
\begin{bmatrix}
M_1 + M_4 - M_5 + M_7 \quad &
M_3 + M_5 \\
M_2 + M_4 \quad &
M_1 - M_2 + M_3 + M_6
\end{bmatrix}.
</math>

We recursively iterate this division process until the [[submatrices]] degenerate into numbers (elements of the ring <math>\mathcal{R}</math>). If, as mentioned above, the original matrix had a size that was not a power of 2, then the resulting product will have zero rows and columns just like <math>A</math> and <math>B</math>, and these will then be stripped at this point to obtain the (smaller) matrix <math>C</math> we really wanted.

Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations.<ref>{{Citation | last1=Skiena | first1=Steven S. | title=The Algorithm Design Manual | publisher=[[Springer-Verlag]] | location=Berlin, New York | isbn=978-0-387-94860-7 | year=1998 | chapter=§8.2.3 Matrix multiplication}}.</ref> However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less).<ref name="dalberto"/>  A more recent study (2016) observed benefits for matrices as small as 512 and a benefit around 20%.<ref name="huang et al."/>