Schönhage–Strassen algorithm

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers, published by Arnold Schönhage and Volker Strassen in 1971.[1] It works by recursively applying fast Fourier transform (FFT) over the integers modulo 2n+1. The run-time bit complexity to multiply two n-digit numbers using the algorithm is in big O notation.

The Schönhage–Strassen algorithm is based on the fast Fourier transform (FFT) method of integer multiplication. This figure demonstrates multiplying 1234 × 5678 = 7006652 using the simple FFT method. Base 10 is used in place of base 2w for illustrative purposes.
Schönhage (on the right) and Strassen (on the left) playing chess in Oberwolfach, 1979

The Schönhage–Strassen algorithm was the asymptotically fastest multiplication method known from 1971 until 2007. It is asymptotically faster than older methods such as Karatsuba and Toom–Cook multiplication, and starts to outperform them in practice for numbers beyond about 10,000 to 100,000 decimal digits.[2] In 2007, Martin Fürer published an algorithm with faster asymptotic complexity.[3] In 2019, David Harvey and Joris van der Hoeven demonstrated that multi-digit multiplication has theoretical complexity; however, their algorithm has constant factors which make it impossibly slow for any conceivable practical problem (see galactic algorithm).[4]

Applications of the Schönhage–Strassen algorithm include large computations done for their own sake such as the Great Internet Mersenne Prime Search and approximations of π, as well as practical applications such as Lenstra elliptic curve factorization via Kronecker substitution, which reduces polynomial multiplication to integer multiplication.[5][6]

Description edit

This section has a simplified version of the algorithm, showing how to compute the product   of two natural numbers  , modulo a number of the form  , where   is some fixed number. The integers   are to be divided into   blocks of   bits, so in practical implementations, it is important to strike the right balance between the parameters  . In any case, this algorithm will provide a way to multiply two positive integers, provided   is chosen so that  .

Let   be the number of bits in the signals   and  , where   is a power of two. Divide the signals   and   into   blocks of   bits each, storing the resulting blocks as arrays   (whose entries we shall consider for simplicity as arbitrary precision integers).

We now select a modulus for the Fourier transform, as follows. Let   be such that  . Also put  , and regard the elements of the arrays   as (arbitrary precision) integers modulo  . Observe that since  , the modulus is large enough to accommodate any carries that can result from multiplying   and  . Thus, the product   (modulo  ) can be calculated by evaluating the convolution of  . Also, with  , we have  , and so   is a primitive  th root of unity modulo  .

We now take the discrete Fourier transform of the arrays   in the ring  , using the root of unity   for the Fourier basis, giving the transformed arrays  . Because   is a power of two, this can be achieved in logarithmic time using a fast Fourier transform.

Let   (pointwise product), and compute the inverse transform   of the array  , again using the root of unity  . The array   is now the convolution of the arrays  . Finally, the product   is given by evaluating

 

This basic algorithm can be improved in several ways. Firstly, it is not necessary to store the digits of   to arbitrary precision, but rather only up to   bits, which gives a more efficient machine representation of the arrays  . Secondly, it is clear that the multiplications in the forward transforms are simple bit shifts. With some care, it is also possible to compute the inverse transform using only shifts. Taking care, it is thus possible to eliminate any true multiplications from the algorithm except for where the pointwise product   is evaluated. It is therefore advantageous to select the parameters   so that this pointwise product can be performed efficiently, either because it is a single machine word or using some optimized algorithm for multiplying integers of a (ideally small) number of words. Selecting the parameters   is thus an important area for further optimization of the method.

Details edit

Every number in base B, can be written as a polynomial:

 

Furthermore, multiplication of two numbers could be thought of as a product of two polynomials:

 

Because, for  :  , we have a convolution.

By using FFT (fast Fourier transform), used in original version rather than NTT,[7] with convolution rule; we get

 

That is;  , where   is the corresponding coefficient in fourier space. This can also be written as: fft(a * b) = fft(a) ● fft(b).

We have the same coefficients due to linearity under Fourier transform, and because these polynomials only consist of one unique term per coefficient:

  and
 

Convolution rule:  

We have reduced our convolution problem to product problem, through FFT.

By finding ifft (polynomial interpolation), for each  , one get the desired coefficients.

Algorithm uses divide and conquer strategy, to divide problem to subproblems.

Convolution under mod N edit

 , where   and   in Schönhage–Strassen algorithm.

By letting:

  and  

where   is the n-th root

One sees that:[8]

 

This mean, one can use weight  , and then multiply with   after.

Instead of using weight; one can due to  , in first step of recursion (when  ), calculate :

 

In normal FFT, that operates over complex numbers, one would use:

 
 

However, FFT can also be used as a NTT (number theoretic transformation) in Schönhage–Strassen. This means that we have to use θ that generate numbers in a finite field (for example  ).

A root of unity under a finite field GF(r), is an element a such that   or  . For example GF(p), where p is a prime, gives  .

Notice that   in   and   in  . For these candidates,   under its finite field, and therefore act the way we want .

Same FFT algorithms can still be used, though, as long as θ is root of unity of a finite field.

To find FFT/NTT transform, we do the following:

 

First product gives contribution to  , for each k. Second gives contribution to  , due to   mod  .

To do the inverse:

  or  

depending on whether fft one use normalize data or not.

One multiply by  , to normailize fft data to a specific range, where  , where m is found using modular multiplicative inverse.

Implementation details edit

Why N = 2M + 1 in mod N edit

In Schönhage–Strassen algorithm,  . One should think of this as a binary tree, where one have values in  . By letting  , one can for each K find all  : One can group all   pairs into M different groups. Using   to group   pairs through convolution, is a classical problem in algorithms.[9] For example: Let k be total income and i be mans income and j womans income; by using convolution, one can group   into K groups based on desired total income.

Having this in mind,   help us to group   into   groups, for each group of subtasks in depth k; in tree with  

Notice that  , for some L. This is Fermat number. When doing mod  , we have something called Fermat ring.

Because some Fermat numbers are Fermat primes, one can in some cases avoid calculations.

There are other N that could have been used, of course, with same prime number advantages. By letting  , one have the maximal number in a binary number with   bits.   is a Mersenne number, that in some cases is a Mersenne prime. It is a natural candidate against Fermat number  

In search of another N edit

Doing several mod calculations against different N, can be helpful when it comes to solving integer product. By using the Chinese remainder theorem, after splitting M into smaller different types of N, one can find the answer of multiplication xy [10]

Fermat numbers and Mersenne numbers are just two types of numbers, in something called generalized Fermat Mersenne number (GSM); with formula:[11]

 
 

In this formula;   is a Fermat number, and   is a Mersenne number.

This formula can be used to generate sets of equations, that can be used in CRT (Chinese remainder theorem):[12]

 , where g is a number such that there exists a x where  , assuming  

Furthermore;  , where a is an element that generates elements in   in a cyclic manner.

If  , where  , then  .

How to choose K for a specific N edit

Following formula helps one out, finding a proper K (number of groups to divide N bits into) given bit size N by calculating efficiency :[13]

  N is bit size (the one used in  ) at outermost level. K gives   groups of bits, where  .

n is found through N, K and k by finding the smallest x, such that  

If one assume efficiency above 50%,   and k is very small compared to rest of formula; one get

 

This means: When something is very effective; K is bound above by   or asymptotically bound above by  

Pseudocode edit

Following alogithm, the standard Modular Schönhage-Strassen Multiplication algorithm (with some optimizations), is found in overview through [14]

  1. Split both input numbers a and b into n coefficients of s bits each.

    Use at least   bits to store them,

    to allow encoding of the value  
  2. Weight both coefficient vectors according to (2.24) with powers of θ by performing cyclic shifts on them.
  3. Shuffle the coefficients   and   .
  4. Evaluate   and   . Multiplications by powers of ω are cyclic shifts.
  5. Do n pointwise multiplications   in  . If SMUL is used recursively, provide K as parameter. Otherwise, use some other multiplication function like T3MUL and reduce modulo   afterwards.
  6. Shuffle the product coefficients  .
  7. Evaluate the product coefficients  .
  8. Apply the counterweights to the   according to (2.25). Since   it follows that  
  9. Normalize the   with   (again a cyclic shift).
  10. Add up the   and propagate the carries. Make sure to properly handle negative coefficients.
  11. Do a reduction modulo  .
  • T3MUL = Toom–Cook multiplication
  • SMUL = Schönhage–Strassen multiplication
  • Evaluate = FFT/IFFT

Further study edit

For implemantion details, one can read the book Prime Numbers: A Computational Perspective.[15] This variant differs somewhat from Schönhage's original method in that it exploits the discrete weighted transform to perform negacyclic convolutions more efficiently. Another source for detailed information is Knuth's The Art of Computer Programming.[16]

Optimizations edit

This section explains a number of important practical optimizations, when implementing Schönhage–Strassen.

Use of other multiplications algorithm, inside algorithm edit

Below a certain cutoff point, it's more efficient to use other multiplication algorithms, such as Toom–Cook multiplication.[17]

Square root of 2 trick edit

The idea is to use   as root of unity of order   in finite field   ( it is a solution to equation  ), when weighting values in NTT (number theoretic transformation) approach. It has been shown to save 10% in integer multiplication time.[18]

Granlund's trick edit

By letting  ; one can compute   and  . In combination with CRT (Chinese Remainder Theorem), to find exact values of multiplication uv[19]

References edit

  1. ^ Schönhage, Arnold; Strassen, Volker (1971). "Schnelle Multiplikation großer Zahlen" [Fast multiplication of large numbers]. Computing (in German). 7 (3–4): 281–292. doi:10.1007/BF02242355. S2CID 9738629.
  2. ^ Karatsuba multiplication has asymptotic complexity of about   and Toom–Cook multiplication has asymptotic complexity of about  

    Van Meter, Rodney; Itoh, Kohei M. (2005). "Fast Quantum Modular Exponentiation". Physical Review. 71 (5): 052320. arXiv:quant-ph/0408006. Bibcode:2005PhRvA..71e2320V. doi:10.1103/PhysRevA.71.052320. S2CID 14983569.

    A discussion of practical crossover points between various algorithms can be found in: Overview of Magma V2.9 Features, arithmetic section Archived 2006-08-20 at the Wayback Machine

    Luis Carlos Coronado García, "Can Schönhage multiplication speed up the RSA encryption or decryption? Archived", University of Technology, Darmstadt (2005)

    The GNU Multi-Precision Library uses it for values of at least 1728 to 7808 64-bit words (33,000 to 150,000 decimal digits), depending on architecture. See:

    "FFT Multiplication (GNU MP 6.2.1)". gmplib.org. Retrieved 2021-07-20.

    "MUL_FFT_THRESHOLD". GMP developers' corner. Archived from the original on 24 November 2010. Retrieved 3 November 2011.

    "MUL_FFT_THRESHOLD". gmplib.org. Retrieved 2021-07-20.

  3. ^ Fürer's algorithm has asymptotic complexity  
    Fürer, Martin (2007). "Faster Integer Multiplication" (PDF). Proc. STOC '07. Symposium on Theory of Computing, San Diego, Jun 2007. pp. 57–66. Archived from the original (PDF) on 2007-03-05.
    Fürer, Martin (2009). "Faster Integer Multiplication". SIAM Journal on Computing. 39 (3): 979–1005. doi:10.1137/070711761. ISSN 0097-5397.

    Fürer's algorithm is used in the Basic Polynomial Algebra Subprograms (BPAS) open source library. See: Covanov, Svyatoslav; Mohajerani, Davood; Moreno Maza, Marc; Wang, Linxiao (2019-07-08). "Big Prime Field FFT on Multi-core Processors". Proceedings of the 2019 on International Symposium on Symbolic and Algebraic Computation (PDF). Beijing China: ACM. pp. 106–113. doi:10.1145/3326229.3326273. ISBN 978-1-4503-6084-5. S2CID 195848601.

  4. ^ Harvey, David; van der Hoeven, Joris (2021). "Integer multiplication in time  " (PDF). Annals of Mathematics. Second Series. 193 (2): 563–617. doi:10.4007/annals.2021.193.2.4. MR 4224716. S2CID 109934776.
  5. ^ This method is used in INRIA's ECM library.
  6. ^ "ECMNET". members.loria.fr. Retrieved 2023-04-09.
  7. ^ Becker, Hanno; Hwang, Vincent; J. Kannwischer, Matthias; Panny, Lorenz (2022). "Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms" (PDF).
  8. ^ Lüders, Christoph (2014). "Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm". p. 26.
  9. ^ Kleinberg, Jon; Tardos, Eva (2005). Algorithm Design (1 ed.). Pearson. p. 237. ISBN 0-321-29535-8.
  10. ^ Gaudry, Pierrick; Alexander, Kruppa; Paul, Zimmermann (2007). "A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm" (PDF). p. 6.
  11. ^ S. Dimitrov, Vassil; V. Cooklev, Todor; D. Donevsky, Borislav (1994). "Generalized Fermat-Mersenne Number Theoretic Transform". p. 2.
  12. ^ S. Dimitrov, Vassil; V. Cooklev, Todor; D. Donevsky, Borislav (1994). "Generalized Fermat-Mersenne Number Theoretic Transform". p. 3.
  13. ^ Gaudry, Pierrick; Kruppa, Alexander; Zimmermann, Paul (2007). "A GMP-based Implementation of Schönhage-Strassen's Large Integer Multiplication Algorithm" (PDF). p. 2.
  14. ^ Lüders, Christoph (2014). "Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm". p. 28.
  15. ^ R. Crandall & C. Pomerance. Prime Numbers – A Computational Perspective. Second Edition, Springer, 2005. Section 9.5.6: Schönhage method, p. 502. ISBN 0-387-94777-9
  16. ^ Knuth, Donald E. (1997). "§ 4.3.3.C: Discrete Fourier transforms". The Art of Computer Programming. Vol. 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley. pp. 305–311. ISBN 0-201-89684-2.
  17. ^ Gaudry, Pierrick; Kruppa, Alexander; Zimmermann, Paul (2007). "A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm" (PDF). p. 7.
  18. ^ Gaudry, Pierrick; Kruppa, Alexander; Zimmermann, Paul (2007). "A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm" (PDF). p. 6.
  19. ^ Gaudry, Pierrick; Kruppa, Alexander; Zimmermann, Paul (2007). "A GMP-based implementation of Schönhage-Strassen's large integer multiplication algorithm" (PDF). p. 6.