Editing Strassen algorithm (section)

== Asymptotic complexity ==

The outline of the algorithm above showed that one can get away with just 7, instead of the traditional 8, matrix-matrix multiplications for the sub-blocks of the matrix. On the other hand, one has to do additions and subtractions of blocks, though this is of no concern for the overall complexity: Adding matrices of size <math>N/2</math> requires only <math>(N/2)^2</math> operations whereas multiplication is substantially more expensive (traditionally <math>2 (N/2)^3</math> addition or multiplication operations).

The question then is how many operations exactly one needs for Strassen's algorithms, and how this compares with the standard matrix multiplication that takes approximately <math>2 N^3</math> (where <math>N = 2^n</math>) arithmetic operations, i.e. an asymptotic complexity <math>\Theta (N^3)</math>.

The number of additions and multiplications required in the Strassen algorithm can be calculated as follows: let <math>f(n)</math> be the number of operations for a <math>2^n \times 2^n</math> matrix. Then by recursive application of the Strassen algorithm, we see that <math>f(n) = 7 f(n-1) + l 4^n</math>, for some constant <math>l</math> that depends on the number of additions performed at each application of the algorithm. Hence <math>f(n) = (7 + o(1))^n</math>, i.e., the asymptotic complexity for multiplying matrices of size <math>N = 2^n</math> using the Strassen algorithm is  <math>O([7+o(1)]^n) = O(N^{\log_{2}7+o(1)}) \approx O(N^{2.8074})</math>. The reduction in the number of arithmetic operations however comes at the price of a somewhat reduced [[numerical stability]],<ref>{{cite journal|last=Webb|first=Miller|title=Computational complexity and numerical stability|journal=SIAM J. Comput.|year=1975|volume=4|issue=2|pages=97–107 |doi=10.1137/0204009 }}</ref> and the algorithm also requires significantly more memory compared to the naive algorithm. Both initial matrices must have their dimensions expanded to the next power of 2, which results in storing up to four times as many elements, and the seven auxiliary matrices each contain a quarter of the elements in the expanded ones.

Strassen's algorithm needs to be compared to the "naive" way of doing the matrix multiplication that would require 8 instead of 7 multiplications of sub-blocks. This would then give rise to the complexity one expects from the standard approach: <math>O(8^n) = O(N^{\log_{2}8}) = O(N^3)</math>. The comparison of these two algorithms shows that ''asymptotically'', Strassen's algorithm is faster: There exists a size <math>N_\text{threshold}</math> so that matrices that are larger are more efficiently multiplied with Strassen's algorithm than the "traditional" way. However, the asymptotic statement does not imply that Strassen's algorithm is ''always'' faster even for small matrices, and in practice this is in fact not the case: For small matrices, the cost of the additional additions of matrix blocks outweighs the savings in the number of multiplications. There are also other factors not captured by the analysis above, such as the difference in cost on today's hardware between loading data from memory onto processors vs. the cost of actually doing operations on this data. As a consequence of these sorts of considerations, Strassen's algorithm is typically only used on "large" matrices. This kind of effect is even more pronounced with alternative algorithms such as the one by [[Coppersmith-Winograd algorithm|Coppersmith and Winograd]]: While ''asymptotically'' even faster, the cross-over point <math>N_\text{threshold}</math> is so large that the algorithm is not generally used on matrices one encounters in practice.

=== Rank or bilinear complexity ===
The bilinear complexity or '''rank''' of a [[bilinear map]] is an important concept in the asymptotic complexity of matrix multiplication.  The rank of a bilinear map <math>\phi:\mathbf A \times \mathbf B \rightarrow \mathbf C</math> over a field '''F''' is defined as (somewhat of an [[abuse of notation]])
:<math>R(\phi/\mathbf F) = \min \left\{r\left|\exists f_i\in \mathbf A^*,g_i\in\mathbf B^*,w_i\in\mathbf C , \forall \mathbf a\in\mathbf A, \mathbf b\in\mathbf B, \phi(\mathbf a,\mathbf b) = \sum_{i=1}^r f_i(\mathbf a)g_i(\mathbf b)w_i \right.\right\}</math>
In other words, the rank of a bilinear map is the length of its shortest bilinear computation.<ref>{{cite book |last1=Burgisser |last2=Clausen |last3=Shokrollahi |title=Algebraic Complexity Theory |publisher=Springer-Verlag |year=1997 |isbn=3-540-60582-7 }}</ref>  The existence of Strassen's algorithm shows that the rank of <math>2 \times 2</math> matrix multiplication is no more than seven.  To see this, let us express this algorithm (alongside the standard algorithm) as such a bilinear computation.  In the case of matrices, the [[dual space]]s '''A'''* and '''B'''* consist of maps into the field '''F''' induced by a scalar '''[[Dyadics#Product of dyadic and dyadic|double-dot product]]''', (i.e. in this case the sum of all the entries of a [[Hadamard product (matrices)|Hadamard product]].)
{| class = "wikitable"
| || colspan="3" | Standard algorithm || ||colspan="3" | Strassen algorithm
|-
|| <math>i</math> || <math>f_i (\mathbf{a})</math> || <math>g_i (\mathbf{b})</math> || <math>w_i</math> || || <math>f_i (\mathbf{a})</math> || <math>g_i (\mathbf{b})</math> || <math>w_i</math>
|-
|| 1
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}1&0\\0&1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}1&0\\0&1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}1&0\\0&1\end{bmatrix}</math>
|-
|| 2
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}0&0\\1&1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\1&-1\end{bmatrix}</math>
|-
|| 3
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&1\\0&-1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&1\\0&1\end{bmatrix}</math>
|-
|| 4
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}-1&0\\1&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}1&0\\1&0\end{bmatrix}</math>
|-
|| 5
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}1&1\\0&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}-1&1\\0&0\end{bmatrix}</math>
|-
|| 6
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}</math>
||
||<math>\begin{bmatrix}-1&0\\1&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}1&1\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}</math>
|-
|| 7
||<math>\begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}</math>
||
||<math>\begin{bmatrix}0&1\\0&-1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\1&1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}1&0\\0&0\end{bmatrix}</math>
|-
|| 8
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b</math>
||<math>\begin{bmatrix}0&0\\0&1\end{bmatrix}</math>
||
|colspan="3"|
|-
||
|colspan="3"|<math>\mathbf a\mathbf b = \sum_{i=1}^8 f_i(\mathbf a)g_i(\mathbf b)w_i</math>
||
|colspan="3"|<math>\mathbf a\mathbf b = \sum_{i=1}^7 f_i(\mathbf a)g_i(\mathbf b)w_i</math>
|}
It can be shown that the total number of elementary multiplications <math>L</math> required for matrix multiplication is tightly asymptotically bound to the rank <math>R</math>, i.e. <math>L = \Theta(R)</math>, or more specifically, since the constants are known, <math>R / 2 \le L \le R</math>. One useful property of the rank is that it is submultiplicative for [[tensor product]]s, and this enables one to show that <math>2^n \times 2^n \times 2^n</math> matrix multiplication can be accomplished with no more than <math>7n</math> elementary multiplications for any <math>n</math>.  (This <math>n</math>-fold tensor product of the <math>2 \times 2 \times 2</math> matrix multiplication map with itself &mdash; an <math>n</math>-th tensor power&mdash;is realized by the recursive step in the algorithm shown.)

===Cache behavior===
Strassen's algorithm is [[Cache-oblivious algorithm|cache oblivious]]. Analysis of its [[CPU cache|cache]] behavior algorithm has shown it to incur

:<math>\Theta \left(1 + \frac{n^2}{b} + \frac{n^{\log_2 7}}{b\sqrt{M}} \right)</math>

cache misses during its execution, assuming an idealized cache of size <math>M</math> (i.e. with <math>M / b</math> lines of length <math>b</math>).<ref name="prokop">{{cite conference |first1=M. |last1=Frigo |first2=C. E. |last2=Leiserson |authorlink2=Charles Leiserson |first3=H. |last3=Prokop |authorlink3=Harald Prokop |first4=S. |last4=Ramachandran |title=Cache-oblivious algorithms |conference=Proc. IEEE Symp. on Foundations of Computer Science (FOCS) |pages=285–297 |year=1999 |url=http://supertech.csail.mit.edu/papers/FrigoLePr99.pdf}}</ref>{{rp|13}}