Montgomery modular multiplication
Template:Short description In modular arithmetic computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication. It was introduced in 1985 by the American mathematician Peter L. Montgomery.<ref name="montg1985">Template:Cite journal</ref><ref name="kochanski">Martin Kochanski, "Montgomery Multiplication" Template:Webarchive a colloquial explanation.</ref>
Montgomery modular multiplication relies on a special representation of numbers called Montgomery form. The algorithm uses the Montgomery forms of Template:Mvar and Template:Mvar to efficiently compute the Montgomery form of Template:Math. The efficiency comes from avoiding expensive division operations. Classical modular multiplication reduces the double-width product Template:Math using division by Template:Mvar and keeping only the remainder. This division requires quotient digit estimation and correction. The Montgomery form, in contrast, depends on a constant Template:Mvar which is coprime to Template:Mvar, and the only division necessary in Montgomery multiplication is division by Template:Mvar. The constant Template:Mvar can be chosen so that division by Template:Mvar is easy, significantly improving the speed of the algorithm. In practice, Template:Mvar is always a power of two, since division by powers of two can be implemented by bit shifting.
The need to convert Template:Mvar and Template:Mvar into Montgomery form and their product out of Montgomery form means that computing a single product by Montgomery multiplication is slower than the conventional or Barrett reduction algorithms. However, when performing many multiplications in a row, as in modular exponentiation, intermediate results can be left in Montgomery form. Then the initial and final conversions become a negligible fraction of the overall computation. Many important cryptosystems such as RSA and Diffie–Hellman key exchange are based on arithmetic operations modulo a large odd number, and for these cryptosystems, computations using Montgomery multiplication with Template:Mvar a power of two are faster than the available alternatives.<ref>Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. Template:Isbn, chapter 14.</ref>
Modular arithmeticEdit
Let Template:Mvar denote a positive integer modulus. The quotient ring Template:Math consists of residue classes modulo Template:Mvar, that is, its elements are sets of the form
- <math>\{ a + kN \colon k \in \mathbf{Z} \},</math>
where Template:Mvar ranges across the integers. Each residue class is a set of integers such that the difference of any two integers in the set is divisible by Template:Mvar (and the residue class is maximal with respect to that property; integers aren't left out of the residue class unless they would violate the divisibility condition). The residue class corresponding to Template:Mvar is denoted Template:Math. Equality of residue classes is called congruence and is denoted
- <math>\bar a \equiv \bar b \pmod{N}.</math>
Storing an entire residue class on a computer is impossible because the residue class has infinitely many elements. Instead, residue classes are stored as representatives. Conventionally, these representatives are the integers Template:Mvar for which Template:Math. If Template:Mvar is an integer, then the representative of Template:Math is written Template:Math. When writing congruences, it is common to identify an integer with the residue class it represents. With this convention, the above equality is written Template:Math.
Arithmetic on residue classes is done by first performing integer arithmetic on their representatives. The output of the integer operation determines a residue class, and the output of the modular operation is determined by computing the residue class's representative. For example, if Template:Math, then the sum of the residue classes Template:Math and Template:Math is computed by finding the integer sum Template:Math, then determining Template:Math, the integer between 0 and 16 whose difference with 22 is a multiple of 17. In this case, that integer is 5, so Template:Math.
Montgomery formEdit
If Template:Mvar and Template:Mvar are integers in the range Template:Math, then their sum is in the range Template:Math and their difference is in the range Template:Math, so determining the representative in Template:Math requires at most one subtraction or addition (respectively) of Template:Mvar. However, the product Template:Math is in the range Template:Math. Storing the intermediate integer product Template:Math requires twice as many bits as either Template:Mvar or Template:Mvar, and efficiently determining the representative in Template:Math requires division. Mathematically, the integer between 0 and Template:Math that is congruent to Template:Math can be expressed by applying the Euclidean division theorem:
- <math>ab = qN + r,</math>
where Template:Mvar is the quotient <math>\lfloor ab / N \rfloor</math> and Template:Mvar, the remainder, is in the interval Template:Math. The remainder Template:Mvar is Template:Math. Determining Template:Mvar can be done by computing Template:Mvar, then subtracting Template:Math from Template:Math. For example, again with <math>N=17</math>, the product Template:Math is determined by computing <math>7 \cdot 15 = 105</math>, dividing <math>\lfloor 105 / 17 \rfloor = 6</math>, and subtracting <math>105 - 6 \cdot 17 = 105 - 102 = 3</math>.
Because the computation of Template:Mvar requires division, it is undesirably expensive on most computer hardware. Montgomery form is a different way of expressing the elements of the ring in which modular products can be computed without expensive divisions. While divisions are still necessary, they can be done with respect to a different divisor Template:Mvar. This divisor can be chosen to be a power of two, for which division can be replaced by shifting, or a whole number of machine words, for which division can be replaced by omitting words. These divisions are fast, so most of the cost of computing modular products using Montgomery form is the cost of computing ordinary products.
The auxiliary modulus Template:Mvar must be a positive integer such that Template:Math. For computational purposes it is also necessary that division and reduction modulo Template:Mvar are inexpensive, and the modulus is not useful for modular multiplication unless Template:Math. The Montgomery form of the residue class Template:Math with respect to Template:Mvar is Template:Math, that is, it is the representative of the residue class Template:Math. For example, suppose that Template:Math and that Template:Math. The Montgomery forms of 3, 5, 7, and 15 are Template:Math, Template:Math, Template:Math, and Template:Math.
Addition and subtraction in Montgomery form are the same as ordinary modular addition and subtraction because of the distributive law:
- <math>aR + bR = (a + b)R,</math>
- <math>aR - bR = (a - b)R.</math>
Note that doing the operation in Montgomery form does not lose information compared to doing it in the quotient ring Template:Math. This is a consequence of the fact that, because Template:Math, multiplication by Template:Mvar is an isomorphism on the additive group Template:Math. For example, Template:Math, which in Montgomery form becomes Template:Math.
Multiplication in Montgomery form, however, is seemingly more complicated. The usual product of Template:Math and Template:Math does not represent the product of Template:Mvar and Template:Mvar because it has an extra factor of Template:Mvar:
- <math>(aR \bmod N)(bR \bmod N) \bmod N = (abR)R \bmod N.</math>
Computing products in Montgomery form requires removing the extra factor of Template:Mvar. While division by Template:Mvar is cheap, the intermediate product Template:Math is not divisible by Template:Mvar because the modulo operation has destroyed that property. So for instance, the product of the Montgomery forms of 7 and 15 modulo 17, with Template:Math, is the product of 3 and 4, which is 12. Since 12 is not divisible by 100, additional effort is required to remove the extra factor of Template:Mvar.
Removing the extra factor of Template:Mvar can be done by multiplying by an integer Template:Math such that Template:Math, that is, by an Template:Math whose residue class is the modular inverse of Template:Mvar mod Template:Mvar. Then, working modulo Template:Mvar,
- <math>(aR \bmod N)(bR \bmod N)R' \equiv (aR)(bR)R^{-1} \equiv (ab)R \pmod{N}.</math>
The integer Template:Math exists because of the assumption that Template:Mvar and Template:Mvar are coprime. It can be constructed using the extended Euclidean algorithm. The extended Euclidean algorithm efficiently determines integers Template:Math and Template:Math that satisfy Bézout's identity: Template:Math, Template:Math, and:
- <math>RR' - NN' = 1.</math>
This shows that it is possible to do multiplication in Montgomery form. A straightforward algorithm to multiply numbers in Montgomery form is therefore to multiply Template:Math, Template:Math, and Template:Math as integers and reduce modulo Template:Mvar.
For example, to multiply 7 and 15 modulo 17 in Montgomery form, again with Template:Math, compute the product of 3 and 4 to get 12 as above. The extended Euclidean algorithm implies that Template:Math, so Template:Math. Multiply 12 by 8 to get 96 and reduce modulo 17 to get 11. This is the Montgomery form of 3, as expected.
The REDC algorithmEdit
While the above algorithm is correct, it is slower than multiplication in the standard representation because of the need to multiply by Template:Math and divide by Template:Mvar. Montgomery reduction, also known as REDC, is an algorithm that simultaneously computes the product by Template:Math and reduces modulo Template:Mvar more quickly than the naïve method. Unlike conventional modular reduction, which focuses on making the number smaller than Template:Mvar, Montgomery reduction focuses on making the number more divisible by Template:Mvar. It does this by adding a small multiple of Template:Mvar which is sophisticatedly chosen to cancel the residue modulo Template:Mvar. Dividing the result by Template:Mvar yields a much smaller number. This number is so much smaller that it is nearly the reduction modulo Template:Mvar, and computing the reduction modulo Template:Mvar requires only a final conditional subtraction. Because all computations are done using only reduction and divisions with respect to Template:Mvar, not Template:Mvar, the algorithm runs faster than a straightforward modular reduction by division.
function REDC is input: Integers R and N with Template:Nowrap, Integer N′ in Template:Nowrap such that Template:Nowrap, Integer T in the range Template:Nowrap. output: Integer S in the range Template:Nowrap such that Template:Nowrap m ← ((T mod R)N′) mod R t ← (T + mN) / R if t ≥ N then return Template:Nowrap else return t end if end function
To see that this algorithm is correct, first observe that Template:Mvar is chosen precisely so that Template:Math is divisible by Template:Mvar. A number is divisible by Template:Mvar if and only if it is congruent to zero mod Template:Mvar, and we have:
- <math>T + mN \equiv T + (((T \bmod R)N') \bmod R)N \equiv T + T N' N \equiv T - T \equiv 0 \pmod{R}.</math>
Therefore, Template:Mvar is an integer. Second, the output is either Template:Mvar or Template:Math, both of which are congruent to Template:Math, so to prove that the output is congruent to Template:Math, it suffices to prove that Template:Mvar is Template:Math, Template:Mvar satisfies:
- <math>t \equiv (T + mN)R^{-1} \equiv TR^{-1} + (mR^{-1})N \equiv TR^{-1} \pmod{N}.</math>
Therefore, the output has the correct residue class. Third, Template:Mvar is in Template:Math, and therefore Template:Math is between 0 and Template:Math. Hence Template:Mvar is less than Template:Math, and because it's an integer, this puts Template:Mvar in the range Template:Math. Therefore, reducing Template:Mvar into the desired range requires at most a single subtraction, so the algorithm's output lies in the correct range.
To use REDC to compute the product of 7 and 15 modulo 17, first convert to Montgomery form and multiply as integers to get 12 as above. Then apply REDC with Template:Math, Template:Math, Template:Math, and Template:Math. The first step sets Template:Mvar to Template:Math. The second step sets Template:Mvar to Template:Math. Notice that Template:Math is 1100, a multiple of 100 as expected. Template:Mvar is set to 11, which is less than 17, so the final result is 11, which agrees with the computation of the previous section.
As another example, consider the product Template:Math but with Template:Math. Using the extended Euclidean algorithm, compute Template:Math, so Template:Math will be Template:Math. The Montgomery forms of 7 and 15 are Template:Math and Template:Math, respectively. Their product 28 is the input Template:Mvar to REDC, and since Template:Math, the assumptions of REDC are satisfied. To run REDC, set Template:Mvar to Template:Math. Then Template:Math, so Template:Math. Because Template:Math, this is the Montgomery form of Template:Math.
Interpretation via the Chinese Remainder Theorem <ref>Template:Cite arXiv</ref>Edit
Given the modulus Template:Mvar and the Montgomery radix Template:Mvar used in a Montgomery reduction, consider the residue ring
<math>\mathbb{Z}/(NR)\mathbb{Z}\;\cong\;\mathbb{Z}/N\mathbb{Z}\;\times\;\mathbb{Z}/R\mathbb{Z},</math>
an isomorphism that follows from the Chinese Remainder Theorem (CRT).
CRT reconstruction for an intermediate productEdit
For an integer <math>T</math> with <math>0 \le T < NR</math> (as is typical when <math>T</math> arises from multiplying two residues), take its reductions
<math>T_N = T \bmod N, \qquad
T_R = T \bmod R .
</math>
The CRT gives the explicit reconstruction formula
<math>T \equiv
T_N\bigl(R^{-1} \bmod N\bigr)\,R
\;+
T_R\bigl(N^{-1} \bmod R\bigr)\,N
\pmod{NR}.
</math>
Because the right-hand side is already taken modulo <math>NR</math>, this may also be written as
<math>T \equiv
\bigl(T_N R^{-1} \bmod N\bigr) R
\;+
\bigl(T_R N^{-1} \bmod R\bigr) N
\pmod{NR}.
</math>
Both summands lie in the half‑open interval <math>[0,NR)</math>:
<math>0 \le \bigl(T_N R^{-1}\bmod N\bigr) R < NR,\qquad
0 \le \bigl(T_R N^{-1}\bmod R\bigr) N < NR .
</math>
Hence, as integer equations (not merely congruences) we have
<math>T =
\bigl(T_N R^{-1} \bmod N\bigr) R
\;+
\bigl(T_R N^{-1} \bmod R\bigr) N ,
</math>
or,
<math>T + NR =
\bigl(T_N R^{-1} \bmod N\bigr) R
\;+
\bigl(T_R N^{-1} \bmod R\bigr) N .
</math>
Isolating the term containing T mod NEdit
To solve for <math>T_N</math>, isolate the first summand:
<math>\bigl(T_N R^{-1} \bmod N\bigr) R
=
\begin{cases}
T - \bigl(T_R N^{-1} \bmod R\bigr) N, \\
T + NR - \bigl(T_R N^{-1} \bmod R\bigr) N .
\end{cases}
</math>
Every quantity above is an integer, and the left‑hand side is a multiple of <math>R</math>; therefore each right‑hand side is divisible by <math>R</math>. Dividing by <math>R</math> yields
<math>T_N R^{-1}\bmod N =\begin{cases}
\dfrac{T - \bigl(T_R N^{-1} \bmod R\bigr) N}{R},\\[8pt]
\dfrac{T + NR - \bigl(T_R N^{-1} \bmod R\bigr) N}{R}
\,=\,\dfrac{T - \bigl(T_R N^{-1} \bmod R\bigr) N}{R} + N .
\end{cases}
</math>
Resulting relationsEdit
Consequently,
<math>\dfrac{T - \bigl(T_R N^{-1} \bmod R\bigr) N}{R}
=
\begin{cases}
T_N R^{-1}\bmod N,\\
T_N R^{-1}\bmod N \; + \; N .
\end{cases}
</math>
This gives two key facts:
- Congruence
<math> \frac{T - \bigl(T_R N^{-1}\bmod R\bigr) N}{R} \;\equiv\; T_N R^{-1} \pmod{N}.</math>
- Numeric bound
<math>
0 \;\le\; \frac{T - \bigl(T_R N^{-1}\bmod R\bigr) N}{R} \;<\; 2N.
</math>
Therefore, by reducing
<math> \frac{T - \bigl(T_R N^{-1}\bmod R\bigr) N}{R} </math>
once more modulo Template:Mvar, one obtains the non‑negative residue representing <math>T_{N} R^{-1}\bmod N</math>.
Arithmetic in Montgomery formEdit
Many operations of interest modulo Template:Mvar can be expressed equally well in Montgomery form. Addition, subtraction, negation, comparison for equality, multiplication by an integer not in Montgomery form, and greatest common divisors with Template:Mvar may all be done with the standard algorithms. The Jacobi symbol can be calculated as <math>\big(\tfrac{a}{N}\big) = \big(\tfrac{aR}{N}\big) / \big(\tfrac{R}{N}\big)</math> as long as <math>\big(\tfrac{R}{N}\big)</math> is stored.
When Template:Math, most other arithmetic operations can be expressed in terms of REDC. This assumption implies that the product of two representatives mod Template:Mvar is less than Template:Mvar, the exact hypothesis necessary for REDC to generate correct output. In particular, the product of Template:Math and Template:Math is Template:Math. The combined operation of multiplication and REDC is often called Montgomery multiplication.
Conversion into Montgomery form is done by computing Template:Math. Conversion out of Montgomery form is done by computing Template:Math. The modular inverse of Template:Math is Template:Math. Modular exponentiation can be done using exponentiation by squaring by initializing the initial product to the Montgomery representation of 1, that is, to Template:Math, and by replacing the multiply and square steps by Montgomery multiplies.
Performing these operations requires knowing at least Template:Math and Template:Math. When Template:Mvar is a power of a small positive integer Template:Mvar, Template:Math can be computed by Hensel's lemma: The inverse of Template:Mvar modulo Template:Mvar is computed by a naïve algorithm (for instance, if Template:Math then the inverse is 1), and Hensel's lemma is used repeatedly to find the inverse modulo higher and higher powers of Template:Mvar, stopping when the inverse modulo Template:Mvar is known; Template:Math is the negation of this inverse. The constants Template:Math and Template:Math can be generated as Template:Math and as Template:Math. The fundamental operation is to compute REDC of a product. When standalone REDC is needed, it can be computed as REDC of a product with Template:Math. The only place where a direct reduction modulo Template:Mvar is necessary is in the precomputation of Template:Math.
Montgomery arithmetic on multiprecision integersEdit
Most cryptographic applications require numbers that are hundreds or even thousands of bits long. Such numbers are too large to be stored in a single machine word. Typically, the hardware performs multiplication mod some base Template:Mvar, so performing larger multiplications requires combining several small multiplications. The base Template:Mvar is typically 2 for microelectronic applications, 28 for 8-bit firmware,<ref name="kizhvatov" /> or 232 or 264 for software applications.
The REDC algorithm requires products modulo Template:Mvar, and typically Template:Math so that REDC can be used to compute products. However, when Template:Mvar is a power of Template:Mvar, there is a variant of REDC which requires products only of machine word sized integers. Suppose that positive multi-precision integers are stored little endian, that is, Template:Mvar is stored as an array Template:Math such that Template:Math for all Template:Mvar and Template:Math. The algorithm begins with a multiprecision integer Template:Mvar and reduces it one word at a time. First an appropriate multiple of Template:Mvar is added to make Template:Mvar divisible by Template:Mvar. Then a multiple of Template:Mvar is added to make Template:Mvar divisible by Template:Math, and so on. Eventually Template:Mvar is divisible by Template:Mvar, and after division by Template:Mvar the algorithm is in the same place as REDC was after the computation of Template:Mvar.
function MultiPrecisionREDC is Input: Integer N with Template:Nowrap, stored as an array of p words, Integer Template:Nowrap, --thus, r = logB R Integer N′ in Template:Nowrap such that Template:Nowrap, Integer T in the range Template:Nowrap, stored as an array of Template:Nowrap words. Output: Integer S in Template:Nowrap such that Template:Nowrap, stored as an array of p words. Set Template:Nowrap (extra carry word) for Template:Nowrap do --loop1- Make T divisible by Template:Nowrap c ← 0 m ← Template:Nowrap for Template:Nowrap do --loop2- Add the Template:Nowrap and the carry from earlier, and find the new carry x ← Template:Nowrap T[i + j] ← Template:Nowrap c ← Template:Nowrap end for for Template:Nowrap do --loop3- Continue carrying x ← Template:Nowrap T[i + j] ← Template:Nowrap c ← Template:Nowrap end for end for for Template:Nowrap do S[i] ← T[i + r] end for if Template:Nowrap then return Template:Nowrap else return Template:Var end if end function
The final comparison and subtraction is done by the standard algorithms.
The above algorithm is correct for essentially the same reasons that REDC is correct. Each time through the Template:Mvar loop, Template:Mvar is chosen so that Template:Math is divisible by Template:Mvar. Then Template:Mvar is added to Template:Mvar. Because this quantity is zero mod Template:Mvar, adding it does not affect the value of Template:Math. If Template:Mvar denotes the value of Template:Mvar computed in the Template:Mvarth iteration of the loop, then the algorithm sets Template:Mvar to Template:Math. Because MultiPrecisionREDC and REDC produce the same output, this sum is the same as the choice of Template:Mvar that the REDC algorithm would make.
The last word of Template:Mvar, Template:Math (and consequently Template:Math), is used only to hold a carry, as the initial reduction result is bound to a result in the range of Template:Math. It follows that this extra carry word can be avoided completely if it is known in advance that Template:Math. On a typical binary implementation, this is equivalent to saying that this carry word can be avoided if the number of bits of Template:Mvar is smaller than the number of bits of Template:Mvar. Otherwise, the carry will be either zero or one. Depending upon the processor, it may be possible to store this word as a carry flag instead of a full-sized word.
It is possible to combine multiprecision multiplication and REDC into a single algorithm. This combined algorithm is usually called Montgomery multiplication. Several different implementations are described by Koç, Acar, and Kaliski.<ref>Template:Cite journal</ref> The algorithm may use as little as Template:Math words of storage (plus a carry bit).
As an example, let Template:Math, Template:Math, and Template:Math. Suppose that Template:Math and Template:Math. The Montgomery representations of Template:Mvar and Template:Mvar are Template:Math and Template:Math. Compute Template:Math. The initial input Template:Mvar to MultiPrecisionREDC will be [6, 4, 8, 5, 6, 7]. The number Template:Mvar will be represented as [7, 9, 9]. The extended Euclidean algorithm says that Template:Math, so Template:Math will be 7.
i ← 0 m ← Template:Nowrap j T c - ------- - 0 0485670 2 (After first iteration of first loop) 1 0485670 2 2 0485670 2 3 0487670 0 (After first iteration of second loop) 4 0487670 0 5 0487670 0 6 0487670 0 i ← 1 m ← Template:Nowrap j T c - ------- - 0 0087670 6 (After first iteration of first loop) 1 0067670 8 2 0067670 8 3 0067470 1 (After first iteration of second loop) 4 0067480 0 5 0067480 0 i ← 2 m ← Template:Nowrap j T c - ------- - 0 0007480 2 (After first iteration of first loop) 1 0007480 2 2 0007480 2 3 0007400 1 (After first iteration of second loop) 4 0007401 0
Therefore, before the final comparison and subtraction, Template:Math. The final subtraction yields the number 50. Since the Montgomery representation of Template:Math is Template:Math, this is the expected result.
When working in base 2, determining the correct Template:Mvar at each stage is particularly easy: If the current working bit is even, then Template:Mvar is zero and if it's odd, then Template:Mvar is one. Furthermore, because each step of MultiPrecisionREDC requires knowing only the lowest bit, Montgomery multiplication can be easily combined with a carry-save adder.
Side-channel attacksEdit
Because Montgomery reduction avoids the correction steps required in conventional division when quotient digit estimates are inaccurate, it is mostly free of the conditional branches which are the primary targets of timing and power side-channel attacks; the sequence of instructions executed is independent of the input operand values. The only exception is the final conditional subtraction of the modulus, but it is easily modified (to always subtract something, either the modulus or zero) to make it resistant.<ref name="kizhvatov">Template:Cite conference (Presentation slides.)</ref> It is of course necessary to ensure that the exponentiation algorithm built around the multiplication primitive is also resistant.Template:R<ref>Marc Joye and Sung-Ming Yen. "The Montgomery Powering Ladder". 2002.</ref>