Dixon's factorization method

In number theory, Dixon's factorization method (also Dixon's random squares method<ref name="Kleinjung10">Template:Cite book</ref> or Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method. Unlike for other factor base methods, its run-time bound comes with a rigorous proof that does not rely on conjectures about the smoothness properties of the values taken by a polynomial.

The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.<ref name="Dixon81">Template:Cite journal</ref>

Basic ideaEdit

Dixon's method is based on finding a congruence of squares modulo the integer N which is intended to factor. Fermat's factorization method finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x² mod N is a perfect square (in the integers):

<math>x^2\equiv y^2\quad(\hbox{mod }N),\qquad x\not\equiv\pm y\quad(\hbox{mod }N).</math>

For example, if Template:Nowrap, (by starting at 292, the first number greater than Template:Radic and counting up) the Template:Nowrap is 256, the square of 16. So Template:Nowrap. Computing the greatest common divisor of Template:Nowrap and N using Euclid's algorithm gives 163, which is a factor of N.

In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are only Template:Radic squares less than N.

Dixon's method replaces the condition "is the square of an integer" with the much weaker one "has only small prime factors"; for example, there are 292 squares smaller than 84923; 662 numbers smaller than 84923 whose prime factors are only 2,3,5 or 7; and 4767 whose prime factors are all less than 30. (Such numbers are called B-smooth with respect to some bound B.)

If there are many numbers <math>a_1 \ldots a_n</math> whose squares can be factorized as <math>a_i^2 \mod N = \prod_{j=1}^m b_j^{e_{ij}}</math> for a fixed set <math>b_1 \ldots b_m</math> of small primes, linear algebra modulo 2 on the matrix <math>e_{ij}</math> will give a subset of the <math>a_i</math> whose squares combine to a product of small primes to an even power — that is, a subset of the <math>a_i</math> whose squares multiply to the square of a (hopefully different) number mod N.

MethodEdit

Suppose the composite number N is being factored. Bound B is chosen, and the factor base is identified (which is called P), the set of all primes less than or equal to B. Next, positive integers z are sought such that z² mod N is B-smooth. Therefore we can write, for suitable exponents a_i,

When enough of these relations have been generated (it is generally sufficient that the number of relations be a few more than the size of P), the methods of linear algebra, such as Gaussian elimination, can be used to multiply together these various relations in such a way that the exponents of the primes on the right-hand side are all even:

<math>{z_1^2 z_2^2 \cdots z_k^2 \equiv \prod_{p_i\in P} p_i^{a_{i,1}+a_{i,2}+\cdots+a_{i,k}}\ \pmod{N}\quad (\text{where } a_{i,1}+a_{i,2}+\cdots+a_{i,k} \equiv 0\pmod{2}) }</math>

This yields a congruence of squares of the form Template:Nowrap which can be turned into a factorization of N, Template:Nowrap This factorization might turn out to be trivial (i.e. Template:Nowrap), which can only happen if Template:Nowrap in which case another try must be made with a different combination of relations; but if a nontrivial pair of factors of N is reached, the algorithm terminates.

PseudocodeEdit

This section is taken directly from Dixon (1981).

Dixon's algorithm

Initialization. Let L be a list of integers in the range [1, n], and let P = {p₁, ..., p_h} be the list of the h primes ≤ v. Let B and Z be initially empty lists (Z will be indexed by B).

Step 1. If L is empty, exit (algorithm unsuccessful). Otherwise, take the first term z from L, remove it from L, and proceed to Step 2.

Step 2. Compute w as the least positive remainder of z² mod n. Factor w as:

where Template:Prime has no factor in P. If Template:Prime = 1, proceed to Step 3; otherwise, return to Step 1.

Step 3. Let a ← (a₁, ..., a_h). Add a to B and z to Z. If B has at most h elements, return to Step 1; otherwise, proceed to Step 4.

Step 4. Find the first vector c in B that is linearly dependent (mod 2) on earlier vectors in B. Remove c from B and <math>z_c</math> from Z. Compute coefficients <math>f_b</math> such that:

<math>\mathbf{c} \equiv \sum_{b \in B} f_b \mathbf{b} \pmod{2}</math>

Define:

<math>\mathbf{d} = (d_1, \dots, d_n) \gets \frac{1}{2} \left(\mathbf{c} + \sum f_b \mathbf{b} \right)</math>

Proceed to Step 5.

Step 5. Compute:

so that:

<math>x^2 \equiv \prod_i p_i^{2d_i} = y^2 \mod n.</math>

If <math>x \equiv y</math> or <math>x \equiv -y \pmod{n}</math>, return to Step 1. Otherwise, return:

which provides a nontrivial factor of n, and terminate successfully.

Step-by-step example : factorizing (n = 84923) using Dixon's algorithmEdit

This example is lifted directly from the LeetArxiv substack.<ref>Kibicho, Murage (2025). Hand-Written Paper Implementation: Asymptotically Fast Factorization of Integers.</ref> Credit is given to the original author.

Initialization:
- Define a list of numbers L, ranging from 1 to 84923:

Define a value v, which is the smoothness factor:

Define a list P containing all the prime numbers less than or equal to v:

Define B and Z, two empty lists. B is a list of powers, while Z is a list of accepted integers:

Step 1: Iterating <math>z</math> values
- Write a for loop that indexes the list <math>L</math>. Each element in <math>L</math> is indexed as <math>z</math>. The for loop exits at the end of the list.

int n = 84923; for (int i = 1; i <= n; i++) {

   int z = i;

} </syntaxhighlight>

Step 2: Computing <math>z^2 \mod n</math> and v-smooth Prime Factorization
- To proceed, compute <math>z^2 \mod 84923</math> for each z, then express the result as a prime factorization.

<math>

1^2 \mod 84923 \equiv 1 \mod 84923 = 2^0 \cdot 3^0 \cdot 5^0 \cdot 7^0 \mod 84923 </math>

<math>

\vdots </math>

<math>

513^2 \mod 84923 = 8400 \mod 84923 = 2^4 \cdot 3^1 \cdot 5^2 \cdot 7^1 \mod 84923 </math>

<math>

\vdots </math>

<math>

537^2 \mod 84923 = 33600 \mod 84923 = 2^6 \cdot 3^1 \cdot 5^2 \cdot 7^1 \mod 84923 </math>

<math>

538^2 \mod 84923 = 34675 \mod 84923 = 5^2 \cdot 19^1 \cdot 73^1 \mod 84923 </math>

This step continues for all values of z in the range.

Step 3: If <math>z^2 \mod 84923</math> is 7-smooth, then append its powers to list <math>B</math> and append <math>z</math> to list <math>Z</math>.

Step 4: This step is split into two parts.

Part 1: Find <math>B</math> modulo 2.

<math>

B = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 4 & 1 & 2 & 1 \\ 6 & 1 & 2 & 1 \end{pmatrix} \mod 2 \equiv B = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 \end{pmatrix} </math>

Part 2: Check if any row combinations of <math>B</math> sum to even numbers

For example, summing Row <math>2</math> and Row <math>3</math> gives us a vector of even numbers.

<math>

R_2 = \{0, 1, 0, 1\}</math> and <math>R_3 = \{0, 1, 0, 1\}</math>

then

<math>R_2 + R_3 = \{0, 2, 0, 2\}</math>.

Step 5 : This step is split into four parts.
- Part 1. (Finding x): Multiply the corresponding <math>z</math> values for the rows found in Step 4. Then find the square root. This gives us <math>x</math>.
  - For Row 2, we had <math>2^4 * 3^1 * 5^2 * 7^1</math>.
  - For Row 3, we had <math>2^6 * 3^1 * 5^2 * 7^1</math>.
  - Thus, we find <math>x</math> :

<math>

\begin{array}{ll} (513 \cdot 537) ^ 2 \mod 84923 = y ^ 2 \\ \\ \text{where} \quad x^2 \mod 84923 = (513 \cdot 537) ^ 2 \mod 84923 \\ \\ \text{thus} \quad x = (513 \cdot 537) \mod 84923 \\ \\ \text{so} \quad x = 275481 \mod 84923\\ \\ \text{Finally} \quad x = 20712 \mod 84923\\ \end{array} </math>

Part 2. (Finding y): Multiply the corresponding smooth factorizations for the rows found in Step 4. Then find the square root. This gives us <math>y</math>.

<math>

\begin{array}{ll} y^2 = (2^4 \cdot 3^1 \cdot 5^2 \cdot 7^1) \times (2^6 \cdot 3^1 \cdot 5^2 \cdot 7^1) \\ \\ \text{By the multiplication law of exponents,} \\ y^2 = 2^{(4+6)} \cdot 3^{(1+1)} \cdot 5^{(2+2)} \cdot 7^{(1+1)} \\ \\ \text{Thus,} \\ y^2 = 2^{10} \cdot 3^2 \cdot 5^4 \cdot 7^2 \\ \\ \text{Taking square roots on both sides gives} \\ y = 2^5 \cdot 3^1 \cdot 5^2 \cdot 7^1 \\ \\ \text{Therefore,} \\ y = 32 \times 3 \times 25 \times 7 \\ \\ \text{Finally,} \\ y = 16800 \end{array} </math>

Part 3. (Find x + y and x - y) where x = 20712 and y = 16800.

Part 4. Compute GCD(x+y, n) and GCD(x-y, n), where n = 84923, x+y = 292281 and x-y = 258681

<math>

\begin{array}{ll} \gcd(37512, 84923) = 521 \\ \gcd(3912, 84923) = 163 \end{array} </math>

Quick check shows <math>84923 = 521 \times 163</math>.

OptimizationsEdit

The quadratic sieve is an optimization of Dixon's method. It selects values of x close to the square root of Template:Var such that x² modulo N is small, thereby largely increasing the chance of obtaining a smooth number.

Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number z cannot have more than <math>\log_2 z</math> factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.

A more sophisticated analysis, using the approximation that a number has all its prime factors less than <math>N^{1/a}</math> with probability about <math>a^{-a}</math> (an approximation to the Dickman–de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of <math>\exp\left(\sqrt{\log N \log \log N}\right)</math>.

The optimal complexity of Dixon's method is

<math>O\left(\exp\left(2 \sqrt 2 \sqrt{\log n \log \log n}\right)\right)</math>

in big-O notation, or

in L-notation.

ReferencesEdit

Template:Number theoretic algorithms