Home >Backend Development >C++ >How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?

How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?

DDD
DDDOriginal
2024-12-19 03:14:13478browse

How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?

Modular arithmetics and NTT (finite field DFT) optimizations

Problem:

In a project involving squaring using fast bignum, the implementation with NTT is slow, even for large numbers (12000 bits or more).

Question:

  1. Is there a way to optimize the NTT transform?
  2. Is there a way to speed up modular arithmetics?

Solution:

Optimized NTT Transform

Separating the NTT_fast function into two functions, one with WW[] and the other with iWW[], leads to one less parameter in recursion calls. This provides a small performance improvement.

Optimized Modular Arithmetics

The original code used a series of branching and safety checks, which can be eliminated by utilizing bit tricks and rearranging the main loop. The assembly code for modmul was also modified for efficiency.

New Code

class fourier_NTT
{
    // ...
    // Modular arithmetics (optimized, but it works only for p >= 0x80000000!!!)
    inline uint32 modadd(uint32 a, uint32 b)
    {
        uint32 d;
        //d = a + b;
        //!if (d < a)
        //    d -= p;
        //!if (d >= p)
        //    d -= p;
         __asm {
            mov eax, a;
            add eax, b
        }
        return d;
    }

    inline uint32 modsub(uint32 a, uint32 b)
    {
        uint32 d;
        //d = a - b;
        //!if (a < b)
        // a - b < 0 when a < b, so no need to check
        //!    d += p;
        __asm {
            mov eax, a;
            sub eax, b
        }
        return d;
    }

    inline uint32 modmul(uint32 a, uint32 b)
    {
        uint32 _a = a;
        uint32 _b = b;   

        __asm
        {
        mov eax, a;
        mul b;
        mov ebx, eax;
        mov eax, 2863311530;
        mov ecx, edx;
        mul edx;
        shld edx, eax, 1;
        mov eax, 3221225473;

        mul edx;
        sub ebx, eax;
        mov eax, 3221225473;
        sbb ecx, edx;
        jc addback;

            neg ecx;
            and ecx, eax;
            sub ebx, ecx;

        sub ebx, eax;
        sbb edx, edx;
        and eax, edx;
            addback:
        add eax, ebx;          
        };  
    }
    // ...
};

Conclusion

These optimizations result in a 1.35x speedup of the NTT transform and modular arithmetics. The code is provided in C and inline assembly.

The above is the detailed content of How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn