Editing Schönhage–Strassen algorithm

{{short description|Multiplication algorithm}}{{More citations needed|date=October 2024}}
[[File:Integer multiplication by FFT.svg|thumb|350px|The Schönhage–Strassen algorithm is based on the [[Multiplication algorithm#Fourier transform method|fast Fourier transform (FFT) method of integer multiplication]]. This figure demonstrates multiplying 1234 &times; 5678 = 7006652 using the simple FFT method. Base 10 is used in place of base 2<sup>''w''</sup> for illustrative purposes. ]]
[[File:StrassenSchonhage1979 MFO8655.jpg|thumb|Schönhage (on the right) and Strassen (on the left) playing chess in [[Mathematical Research Institute of Oberwolfach|Oberwolfach]], 1979]]
The '''Schönhage–Strassen algorithm''' is an asymptotically fast [[multiplication algorithm]] for large [[integer]]s, published by [[Arnold Schönhage]] and [[Volker Strassen]] in 1971.<ref name="schönhage">{{cite journal |last1=Schönhage |first1=Arnold |author1-link=Arnold Schönhage |last2=Strassen |first2=Volker |author2-link=Volker Strassen |year=1971 |title=Schnelle Multiplikation großer Zahlen |language=de |trans-title=Fast multiplication of large numbers |journal=Computing |volume=7 |issue=3–4 |pages=281–292 |doi=10.1007/BF02242355 |s2cid=9738629 }}</ref> It works by recursively applying [[fast Fourier transform]] (FFT) over [[Modular arithmetic|the integers modulo]] <math>2^n + 1</math>. The run-time [[bit complexity]] to multiply two {{mvar|n}}-digit numbers using the algorithm is <math>O(n \cdot \log n \cdot \log \log n)</math> in [[big O notation|big {{mvar|O}} notation]].

The Schönhage–Strassen algorithm was the [[multiplication algorithm#Computational complexity of multiplication|asymptotically fastest multiplication method]] known from 1971 until 2007. It is asymptotically faster than older methods such as [[Karatsuba multiplication|Karatsuba]] and [[Toom–Cook multiplication]], and starts to outperform them in practice for numbers beyond about 10,000 to 100,000 decimal digits.<ref>
Karatsuba multiplication has asymptotic complexity of about <math>O(n^{1.58})</math> and Toom–Cook multiplication  has asymptotic complexity of about <math>O(n^{1.46}).</math>
{{pb}}
{{cite journal |first1=Rodney |last1=Van Meter |first2=Kohei M. |last2=Itoh |title=Fast Quantum Modular Exponentiation |journal=Physical Review |volume=71 |year=2005 |issue=5 |pages= 052320|doi=10.1103/PhysRevA.71.052320 |arxiv=quant-ph/0408006 |bibcode=2005PhRvA..71e2320V |s2cid=14983569 }}
{{pb}}
A discussion of practical crossover points between various algorithms can be found in: [http://magma.maths.usyd.edu.au/magma/Features/node86.html Overview of Magma V2.9 Features, arithmetic section] {{webarchive|url=https://web.archive.org/web/20060820053803/http://magma.maths.usyd.edu.au/magma/Features/node86.html |date=2006-08-20 }}
{{pb}}
Luis Carlos Coronado García, "[http://www.cdc.informatik.tu-darmstadt.de/~coronado/Vortrag/MoraviaCrypt-talk-s.pdf  Can Schönhage multiplication speed up the RSA encryption or decryption?] [https://web.archive.org/web/20110719095830/http://www.cdc.informatik.tu-darmstadt.de/~coronado/Vortrag/MoraviaCrypt-talk-s.pdf Archived]", ''University of Technology, Darmstadt'' (2005)
{{pb}}
The [[GNU Multi-Precision Library]] uses it for values of at least 1728 to 7808 64-bit words (33,000 to 150,000 decimal digits), depending on architecture. See:
{{pb}}
{{Cite web|title=FFT Multiplication (GNU MP 6.2.1)|url=https://gmplib.org/manual/FFT-Multiplication|access-date=2021-07-20|website=gmplib.org}}
{{pb}}
{{cite web|title=MUL_FFT_THRESHOLD|url=http://gmplib.org/devel/MUL_FFT_THRESHOLD.html|work=GMP developers' corner|access-date=3 November 2011|url-status=dead|archive-url=https://web.archive.org/web/20101124062232/http://gmplib.org/devel/MUL_FFT_THRESHOLD.html|archive-date=24 November 2010}}
{{pb}}
{{Cite web|title=MUL_FFT_THRESHOLD|url=https://gmplib.org/devel/thres/MUL_FFT_THRESHOLD|access-date=2021-07-20|website=gmplib.org}}
</ref> In 2007, Martin Fürer published [[Fürer's algorithm|an algorithm]] with faster asymptotic complexity.<ref>
Fürer's algorithm has asymptotic complexity <math display=inline>O\bigl(n \cdot \log n \cdot 2^{\Theta(\log^* n)}\bigr).</math>
{{pb}}
{{cite conference |last=Fürer |first=Martin |year=2007 |title=Faster Integer Multiplication |book-title=Proc. STOC '07 |conference=Symposium on Theory of Computing, San Diego, Jun 2007 |pages=57–66 |url=http://www.cse.psu.edu/~furer/Papers/mult.pdf |url-status=dead |archive-url=https://web.archive.org/web/20070305200654id_/http://www.cse.psu.edu/~furer/Papers/mult.pdf |archive-date=2007-03-05 }}{{pb}}{{Cite journal |last=Fürer |first=Martin |year=2009 |title=Faster Integer Multiplication |url=http://dx.doi.org/10.1137/070711761 |journal=SIAM Journal on Computing |volume=39 |issue=3 |pages=979–1005 |doi=10.1137/070711761 |issn=0097-5397}}
{{pb}}
Fürer's algorithm is used in the Basic Polynomial Algebra Subprograms (BPAS) open source library. See: {{Cite book |last1=Covanov |first1=Svyatoslav |last2=Mohajerani |first2=Davood |last3=Moreno Maza |first3=Marc |last4=Wang |first4=Linxiao |title=Proceedings of the 2019 on International Symposium on Symbolic and Algebraic Computation |chapter=Big Prime Field FFT on Multi-core Processors |date=2019-07-08 |chapter-url=https://dl.acm.org/doi/10.1145/3326229.3326273 |language=en |location=Beijing China |publisher=ACM |pages=106–113 |doi=10.1145/3326229.3326273 |isbn=978-1-4503-6084-5|s2cid=195848601 |url=https://hal.inria.fr/hal-02191652/file/cmmw-2019.ISSAC.sigconf.pdf }}
</ref> In 2019, David Harvey and [[Joris van der Hoeven]] demonstrated that multi-digit multiplication has theoretical <math>O(n\log n)</math> complexity; however, their algorithm has constant factors which make it impossibly slow for any conceivable practical problem (see [[galactic algorithm]]).<ref>{{cite journal | last1 = Harvey | first1 = David | last2 = van der Hoeven | first2 = Joris | author2-link = Joris van der Hoeven | doi = 10.4007/annals.2021.193.2.4 | issue = 2 | journal = [[Annals of Mathematics]] | mr = 4224716 | pages = 563–617 | series = Second Series | title = Integer multiplication in time <math>O(n \log n)</math> | volume = 193 | year = 2021| s2cid = 109934776 | url = https://hal.archives-ouvertes.fr/hal-02070778v2/file/nlogn.pdf }}</ref>

Applications of the Schönhage–Strassen algorithm include large computations done for their own sake such as the [[Great Internet Mersenne Prime Search]] and [[Approximations of π|approximations of {{mvar|π}}]], as well as practical applications such as [[Lenstra elliptic curve factorization]] via [[Kronecker substitution]], which reduces polynomial multiplication to integer multiplication.<ref>This method is used in INRIA's [https://gitlab.inria.fr/zimmerma/ecm ECM] library.</ref><ref>{{Cite web |title=ECMNET |url=https://members.loria.fr/PZimmermann/records/ecmnet.html |access-date=2023-04-09 |website=members.loria.fr}}</ref>

== Description==
This section has a simplified version of the algorithm, showing how to compute the product <math>ab</math> of two natural numbers <math>a,b</math>, modulo a number of the form <math>2^n+1</math>, where <math>n=2^kM</math> is some fixed number.  The integers <math>a,b</math> are to be divided into <math>D=2^k</math> blocks of <math>M</math> bits, so in practical implementations, it is important to strike the right balance between the parameters <math>M,k</math>.  In any case, this algorithm will provide a way to multiply two positive integers, provided <math>n</math> is chosen so that <math>ab < 2^n+1</math>.

Let <math>n=DM</math> be the number of bits in the signals <math>a</math> and <math>b</math>, where <math>D=2^k</math> is a power of two.  Divide the signals <math>a</math> and <math>b</math> into <math>D</math> blocks of <math>M</math> bits each, storing the resulting blocks as arrays <math>A,B</math> (whose entries we shall consider for simplicity as arbitrary precision integers).

We now select a modulus for the Fourier transform, as follows.  Let <math>M'</math> be such that <math>DM'\ge 2M+k</math>.  Also put <math>n'=DM'</math>, and regard the elements of the arrays <math>A,B</math> as (arbitrary precision) integers modulo <math>2^{n'}+1</math>.  Observe that since <math>2^{n'} + 1 \ge 2^{2M+k} + 1 = D2^{2M}+1</math>, the modulus is large enough to accommodate any carries that can result from multiplying <math>a</math> and <math>b</math>.  Thus, the product <math>ab</math> (modulo <math>2^n+1</math>) can be calculated by evaluating the convolution of <math>A,B</math>.  Also, with <math>g=2^{2M'}</math>, we have <math>g^{D/2}\equiv -1\pmod{2^{n'}+1}</math>, and so <math>g</math> is a primitive <math>D</math>th root of unity modulo <math>2^{n'}+1</math>.

We now take the discrete Fourier transform of the arrays <math>A,B</math> in the ring <math>\mathbb Z/(2^{n'}+1)\mathbb Z</math>, using the root of unity <math>g</math> for the Fourier basis, giving the transformed arrays <math>\widehat A,\widehat B</math>.  Because <math>D=2^k</math> is a power of two, this can be achieved in logarithmic time using a [[fast Fourier transform]].

Let <math>\widehat C_i=\widehat A_i\widehat B_i</math> (pointwise product), and compute the inverse transform <math>C</math> of the array <math>\widehat C</math>, again using the root of unity <math>g</math>.  The array <math>C</math> is now the convolution of the arrays <math>A,B</math>.  Finally, the product <math>ab\pmod{2^n+1}</math> is given by evaluating
<math display="block">ab\equiv \sum_j C_j2^{Mj}\mod{2^n+1}.</math>

This basic algorithm can be improved in several ways.  Firstly, it is not necessary to store the digits of <math>a,b</math> to arbitrary precision, but rather only up to <math>n'+1</math> bits, which gives a more efficient machine representation of the arrays <math>A,B</math>.  Secondly, it is clear that the multiplications in the forward transforms are simple bit shifts.  With some care, it is also possible to compute the inverse transform using only shifts.  Taking care, it is thus possible to eliminate any true multiplications from the algorithm except for where the pointwise product <math>\widehat C_i=\widehat A_i\widehat B_i</math> is evaluated.  It is therefore advantageous to select the parameters <math>D,M</math> so that this pointwise product can be performed efficiently, either because it is a single machine word or using some optimized algorithm for multiplying integers of a (ideally small) number of words.  Selecting the parameters <math>D,M</math> is thus an important area for further optimization of the method.

== Details ==
Every number in base B, can be written as a polynomial:

:<math> X = \sum_{i=0}^N {x_iB^i} </math>

Furthermore, multiplication of two numbers could be thought of as a product of two polynomials:

:<math>XY = \left(\sum_{i=0}^N {x_iB^i}\right)\left(\sum_{j=0}^N {y_iB^j}\right) </math>

Because, for <math> B^k </math>: <math>c_k =\sum_{(i,j):i+j=k} {a_ib_j} = \sum_{i=0}^k {a_ib_{k-i}}  </math>,
we have a convolution.

By using FFT ([[fast Fourier transform]]), used in the original version rather than NTT ([[Number-theoretic transform]]),<ref>{{cite web |last1=Becker |first1=Hanno |last2=Hwang |first2=Vincent |last3=J. Kannwischer |first3=Matthias |last4=Panny |page=|date=2022|first4=Lorenz |title=Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms |url=https://eprint.iacr.org/2022/439.pdf}}</ref> with convolution rule; we get

:<math> \hat{f}(a * b) =  \hat{f}\left(\sum_{i=0}^k a_ib_{k-i} \right) =  \hat{f}(a) \bullet \hat{f}(b). </math> 
That is; <math> C_k = a_k \bullet b_k </math>, where <math> C_k </math>
is the corresponding coefficient in Fourier space. This can also be written as: <math>\text{fft}(a * b) = \text{fft}(a) \bullet \text{fft}(b)</math>.

We have the same coefficients due to linearity under the Fourier transform, and because these polynomials 
only consist of one unique term per coefficient:

:<math> \hat{f}(x^n) = \left(\frac{i}{2\pi}\right)^n \delta^{(n)}  </math>  and 
:<math> \hat{f}(a\, X(\xi) + b\, Y(\xi)) = a\, \hat{X}(\xi) + b\, \hat{Y}(\xi)</math>

Convolution rule: <math> \hat{f}(X * Y) = \ \hat{f}(X) \bullet \hat{f}(Y) </math>

We have reduced our convolution problem
to product problem, through FFT.

By finding the FFT of the [[polynomial interpolation]] of each <math>C_k </math>, one can determine the desired coefficients.

This algorithm uses the [[divide-and-conquer method]] to divide the problem into subproblems.

=== Convolution under mod ''N'' ===
: <math>c_k =\sum_{(i,j):i+j \equiv k \pmod {N(n)}} a_ib_j </math>, where <math> N(n) = 2^n + 1 </math>.

By letting:

:<math> a_i' = \theta^i a_i </math> and <math> b_j' = \theta^j b_j,</math>

where <math> \theta^N = -1 </math> is the n<sup>th</sup> root, one sees that:<ref>{{cite web |last1=Lüders |year=2014|page=26|first1=Christoph |title=Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm |url=https://www.researchgate.net/publication/273701188}}</ref>

:<math>
\begin{align}
C_k & =\sum_{(i,j):i+j=k \equiv \pmod {N(n)}} a_ib_j = \theta^{-k} \sum_{(i,j):i+j \equiv k \pmod {N(n)}} a_i'b_j' \\[6pt]
& = \theta^{-k} \left(\sum_{(i,j):i+j=k} a_i'b_j' + \sum_{(i,j):i+j=k+n} a_i'b_j' \right) \\[6pt]
& = \theta^{-k}\left(\sum_{(i,j):i+j=k} a_ib_j\theta^k + \sum_{(i,j):i+j=k+n} a_ib_j\theta^{n+k} \right) \\[6pt]
& = \sum_{(i,j):i+j=k} a_ib_j + \theta^n \sum_{(i,j):i+j=k+n} a_ib_j.
\end{align}
</math>

This mean, one can use weight <math> \theta^i </math>, and then multiply with <math> \theta^{-k} </math> after.

Instead of using weight, as <math> \theta^N = -1 </math>, in first step of recursion (when <math> n = N </math>), one can calculate:

:<math> C_k =\sum_{(i,j):i+j \equiv k \pmod {N(N)}} = \sum_{(i,j):i+j=k} a_ib_j - \sum_{(i,j):i+j=k+n} a_ib_j </math>

In a normal FFT which operates over complex numbers, one would use:

:<math display="block">
\exp \left(\frac{2k\pi i}{n}\right) =\cos\frac{2k\pi}{n} + i \sin \frac{2k\pi} n, \qquad k=0,1,\dots, n-1.
</math>

: <math>
\begin{align}
C_k & = \theta^{-k}\left(\sum_{(i,j):i+j=k} a_ib_j\theta^k  + \sum_{(i,j):i+j=k+n} a_ib_j \theta^{n+k} \right) \\[6pt]
& = e^{-i2\pi k/n} \left(\sum_{(i,j):i+j=k} a_ib_je^{i2\pi k /n} +\sum_{(i,j):i+j=k+n} a_ib_je^{i2\pi (n+k)/n} \right) 
\end{align}
</math>

However, FFT can also be used as a NTT ([[Number-theoretic transform|number theoretic transformation]]) in Schönhage–Strassen. This means that we have to use {{mvar| &theta;}} to generate numbers in a finite field (for example <math>\mathrm{GF}( 2^n + 1 )</math>).

A root of unity under a finite field {{math|GF(''r'')}}, is an element a such that <math> \theta^{r-1} \equiv 1 </math> or <math> \theta^r \equiv \theta  </math>. For example {{math|GF(''p'')}}, where {{mvar|p}} is a [[prime number]], 
gives <math>\{1,2,\ldots,p-1\}</math>.

Notice that <math> 2^n \equiv -1</math> in <math>\operatorname{GF}(2^n + 1)</math>  and <math> \sqrt{2} \equiv -1 </math> in <math>\operatorname{GF}(2^{n+2} + 1) </math>. For these candidates, <math> \theta^N \equiv -1 </math> under its finite field, and therefore act the way we want .

Same FFT algorithms can still be used, though, as long as {{mvar|&theta;}} is a [[root of unity]] of a finite field.

To find FFT/NTT transform, we do the following:

: <math>
\begin{align}
C_k' & = \hat{f}(k) = \hat{f}\left(\theta^{-k}\left(\sum_{(i,j):i+j=k} a_ib_j\theta^k +\sum_{(i,j):i+j=k+n} a_ib_j\theta^{n+k} \right) \right) \\[6pt]
C_{k+k}' & = \hat{f}(k+k) = \hat{f} \left(\sum_{(i,j):i+j=2k} a_i b_j \theta^k +\sum_{(i,j):i+j=n+2k} a_i b_j \theta^{n+k} \right) \\[6pt]
& = \hat{f}\left(\sum_{(i,j):i+j=2k} a_i b_j \theta^k + \sum_{(i,j):i+j=2k+n} a_i b_j \theta^{n+k} \right) \\[6pt]
& = \hat{f}\left(A_{k \leftarrow k}\right) \bullet \hat{f}(B_{k \leftarrow k}) + \hat{f}(A_{k \leftarrow k+n}) \bullet \hat{f}(B_{k \leftarrow k+n})
\end{align}
</math>

First product gives contribution to <math> c_k </math>, for each {{mvar|k}}. Second gives contribution to <math> c_k </math>, due to <math>(i+j)</math> mod <math> N(n) </math>.

To do the inverse:

:<math> C_k = 2^{-m}\hat{f^{-1}} (\theta^{-k} C_{k+k}') </math> or <math> C_k = \hat{f^{-1}}(\theta^{-k} C_{k+k}') </math> 
depending whether data needs to be normalized.

One multiplies by <math> 2^{-m} </math> to normalize FFT data into a specific range, where <math> \frac{1}{n} \equiv 2^{-m} \bmod N(n) </math>, where {{mvar|m}} is found using the [[modular multiplicative inverse]].

==Implementation details==

===Why ''N'' = 2{{sup|''M''}} + 1 mod ''N'' ===

In Schönhage–Strassen algorithm, <math> N = 2^M + 1 </math>. This should be thought of as a binary tree, where one have values in <math> 0 \le  \text{index} \le 2^{M}=2^{i+j} </math>. By letting <math> K \in [0,M] </math>, for each {{mvar|K}} one can find all <math> i+j = K </math>, and group all <math>(i,j)</math> pairs into M different groups. Using <math> i+j = k </math> to group <math>(i,j)</math> pairs through convolution is a classical problem in algorithms.<ref>{{cite book |last1=Kleinberg |first1=Jon |last2=Tardos |first2=Eva |title=Algorithm Design |date=2005 |publisher=Pearson |isbn=0-321-29535-8 |page=237 |edition=1|url=https://archive.org/details/AlgorithmDesign1stEditionByJonKleinbergAndEvaTardos2005PDF/page/n259/mode/2up}}</ref>

Having this in mind, <math> N = 2^M + 1 </math> help us to group <math> (i,j) </math> into <math> \frac{M}{2^k} </math> groups for each group of subtasks in depth {{mvar|k}} in a tree with <math> N = 2^{\frac{M}{2^k}} + 1  </math>

Notice that <math> N = 2^M + 1 = 2^{2^L} + 1 </math>, for some L. This makes N a [[Fermat number]]. When doing mod <math> N = 2^M + 1 = 2^{2^L} + 1 </math>, we have a Fermat ring.

Because some Fermat numbers are Fermat primes, one can in some cases avoid calculations.

There are other ''N'' that could have been used, of course, with same prime number advantages. By letting <math> N = 2^k - 1 </math>, one have the maximal number in a binary number with <math> k+1 </math> bits.
<math> N = 2^k - 1 </math> is a Mersenne number, that in some cases is a Mersenne prime. It is a natural candidate against Fermat number <math> N = 2^{2^L} + 1 </math>

==== In search of another ''N'' ====

Doing several mod calculations against different {{mvar|N}}, can be helpful when it comes to solving integer product. By using the [[Chinese remainder theorem]], after splitting {{mvar|M}} into smaller different types of {{mvar|N}}, one can find the answer of multiplication {{mvar|xy}} <ref>{{cite web |last1=Gaudry |first1=Pierrick |last2=Alexander |first2=Kruppa |last3=Paul |first3=Zimmermann |title=A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm |page=6 | year=2007|url=https://inria.hal.science/inria-00126462/file/fft.final.pdf}}</ref>

Fermat numbers and Mersenne numbers are just two types of numbers, in something called generalized Fermat Mersenne number (GSM); with formula:<ref>{{cite web |last1=S. Dimitrov |first1=Vassil |last2=V. Cooklev |first2=Todor |last3=D. Donevsky |page=2|year=1994|first3=Borislav |title=Generalized Fermat-Mersenne Number Theoretic Transform |url=https://www.researchgate.net/publication/3324542}}</ref>

:<math> G_{q,p,n} = \sum_{i=1}^p q^{(p-i)n} = \frac{q^{pn}-1}{q^n-1} </math>

:<math> M_{p,n} =  G_{2,p,n} </math>

In this formula, <math> M_{2,2^k} </math> is a Fermat number, and <math>  M_{p,1} </math> is a Mersenne number.

This formula can be used to generate sets of equations, that can be used in CRT (Chinese remainder theorem):<ref>{{cite web |last1=S. Dimitrov |first1=Vassil |last2=V. Cooklev |first2=Todor |last3=D. Donevsky |page=3|year=1994|first3=Borislav |title=Generalized Fermat-Mersenne Number Theoretic Transform |url=https://www.researchgate.net/publication/3324542}}</ref>

:<math>g^{\frac{(M_{p,n}-1)}{2}} \equiv -1  \pmod {M_{p,n}}</math>, where {{mvar|g}} is a number such that there exists an {{mvar|x}} where  <math> x^2 \equiv g \pmod {M_{p,n}}</math>, assuming <math> N = 2^n </math>

Furthermore; <math>g^{2^{(p-1)n}-1} \equiv a^{2^n -1}  \pmod {M_{p,n}}</math>, where {{mvar|a}} is an element that generates elements in <math> \{1,2,4,...2^{n-1},2^n\} </math> in a cyclic manner.

If <math> N=2^t </math>, where <math> 1 \le t \le n </math>, then <math> g_t = a^{(2^n-1)2^{n-t}} </math>.

=== How to choose ''K'' for a specific ''N'' ===

The following formula is helpful, finding a proper {{mvar|K}} (number of groups to divide {{mvar|N}} bits into) given bit size {{mvar|N}} by calculating efficiency :<ref>{{cite web |last1=Gaudry |first1=Pierrick |last2=Kruppa |first2=Alexander |last3=Zimmermann |first3=Paul |title=A GMP-based Implementation of Schönhage-Strassen's Large Integer Multiplication Algorithm | page = 2 | date=2007|url=https://inria.hal.science/inria-00126462/file/fft.final.pdf}}</ref>

<math> E = \frac{\frac{2N}{K}+k}{n} </math> {{mvar|N}} is bit size (the one used in <math> 2^N + 1 </math>) at outermost level. {{mvar|K}} gives <math> \frac{N}{K} </math> groups of bits,  where <math> K = 2^k </math>.

{{mvar|n}} is found through {{mvar|N, K}} and {{mvar|k}} by finding the smallest {{mvar|x}}, such that  <math>  2N/K +k \le n = K2^x </math>

If one assume efficiency above 50%, <math> \frac{n}{2} \le \frac{2N}{K}, K \le n </math> and {{mvar|k}} is very small compared to rest of formula; one get

:<math> K \le 2\sqrt{N} </math>

This means: When something is very effective; {{mvar|K}} is bound above by <math> 2\sqrt{N} </math> or asymptotically bound above by <math> \sqrt{N} </math>

===Pseudocode===

Following algorithm, the standard Modular Schönhage-Strassen Multiplication algorithm (with some optimizations), is found in overview through <ref>{{cite web | year=2014 | page=28 |last1=Lüders |first1=Christoph |title=Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm |url=https://www.researchgate.net/publication/273701188}}</ref> 
{{olist
|1= Split both input numbers {{mvar|a}} and {{mvar|b}} into n coefficients of s bits each.
Use at least {{tmath|K + 1}} bits to store them,
to allow encoding of the value {{tmath|2^{K}.}}
|2= Weight both coefficient vectors according to (2.24) with powers of {{mvar|θ}} by performing cyclic
   shifts on them.
|3= Shuffle the coefficients {{tmath|a_i}} and {{tmath|b_j}} .
|4= Evaluate {{tmath|a_i}} and {{tmath|b_j}} . Multiplications by powers of ω are cyclic shifts.
|5= Do {{mvar|n}} pointwise multiplications {{tmath|1=c_k := a_kb_k}} in {{tmath|Z/(2^K + 1)Z}}. If SMUL is used recursively,
   provide {{mvar|K}} as parameter. Otherwise, use some other multiplication function like T3MUL
   and reduce modulo {{tmath|2^{K} + 1}} afterwards.
|6= Shuffle the product coefficients {{tmath|c_k}}.
|7= Evaluate the product coefficients {{tmath|c_k}}.
|8= Apply the counterweights to the {{tmath|c_k}} according to (2.25). Since {{tmath|\theta^{2n} \equiv 1}} it follows that
   {{tmath|\theta^{-k} \equiv \theta^{n-k} }}
|9=  Normalize the {{tmath|c_k}} with {{tmath|1/n \equiv 2^{-m} }} (again a cyclic shift).
|10= Add up the {{tmath|c_k}} and propagate the carries. Make sure to properly handle negative coefficients.
|11= Do a reduction modulo {{tmath|2^{N} + 1}}.
}}
* T3MUL = Toom–Cook multiplication 
* SMUL  = Schönhage–Strassen multiplication
* Evaluate = FFT/IFFT

=== Further study ===
For implemantion details, one can read the book ''Prime Numbers: A Computational Perspective''.<ref name="crandall">R. Crandall & C. Pomerance. ''Prime Numbers – A Computational Perspective''. Second Edition, Springer, 2005. Section 9.5.6: Schönhage method, p. 502. {{ISBN|0-387-94777-9}}</ref> This variant differs somewhat from Schönhage's original method in that it exploits the [[discrete weighted transform]] to perform [[negacyclic convolution]]s more efficiently. Another source for detailed information is [[Donald Knuth|Knuth]]'s ''[[The Art of Computer Programming]]''.<ref>{{cite book |last=Knuth |first=Donald E. |title=[[The Art of Computer Programming]] |year=1997 |volume=2: Seminumerical Algorithms |edition=3rd |publisher=Addison-Wesley |isbn=0-201-89684-2 |chapter=§ 4.3.3.C: Discrete Fourier transforms |pages=305–311 |chapter-url=https://archive.org/details/artofcomputerpro0002knut_u2o0/page/305/ }}</ref>

==Optimizations==

This section explains a number of important practical optimizations, when implementing Schönhage–Strassen.

=== Use of other multiplications algorithm, inside algorithm ===

Below a certain cutoff point, it's more efficient to use other multiplication algorithms, such as [[Toom–Cook multiplication]].<ref>{{cite web |last1=Gaudry |first1=Pierrick |last2=Kruppa |first2=Alexander |last3=Zimmermann |first3=Paul |page=7|year=2007|title=A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm |url=https://inria.hal.science/inria-00126462/file/fft.final.pdf}}</ref>

=== Square root of 2 trick ===

The idea is to use <math> \sqrt{2}  </math> as a [[root of unity]] of order <math> 2^{n+2} </math> in finite field <math>\mathrm{GF}(2^{n+2} +1)</math> ( it is a solution to equation <math> \theta^{2^{n+2}} \equiv 1  \pmod {2^{n+2} + 1}</math>), when weighting values in NTT (number theoretic transformation) approach. It has been shown to save 10% in integer multiplication time.<ref>{{cite web |last1=Gaudry |first1=Pierrick |last2=Kruppa |first2=Alexander |page=6 | date=2007 | last3=Zimmermann |first3=Paul |title=A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm |url=https://inria.hal.science/inria-00126462/file/fft.final.pdf}}</ref>

=== Granlund's trick ===

By letting <math> m = N + h </math>, one can compute
<math>uv \bmod {2^N +1}</math> and <math>(u \bmod {2^h})(v \bmod 2^h)</math>. In combination with CRT (Chinese Remainder Theorem) to 
find exact values of multiplication {{mvar|uv}}<ref>{{cite web |last1=Gaudry |first1=Pierrick |last2=Kruppa |page=6|date=2007|first2=Alexander |last3=Zimmermann |first3=Paul |title=A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm |url=https://inria.hal.science/inria-00126462/file/fft.final.pdf}}</ref>

==References==
{{reflist}}

{{Number-theoretic algorithms}}

{{DEFAULTSORT:Schonhage-Strassen Algorithm}}
[[Category:Computer arithmetic algorithms]]
[[Category:Multiplication]]