Home >Backend Development >C++ >How can vectorization techniques be used to accelerate the conversion of an IPv4 address from a string to an integer?

How can vectorization techniques be used to accelerate the conversion of an IPv4 address from a string to an integer?

DDD
DDDOriginal
2024-11-15 16:49:03646browse

How can vectorization techniques be used to accelerate the conversion of an IPv4 address from a string to an integer?

Fastest Way to Obtain IPv4 Address from String

Original Code In Question:

UINT32 GetIP(const char *p)
{
    UINT32 dwIP=0,dwIP_Part=0;
    while(true)
    {
        if(p[0] == 0)
        {
            dwIP = (dwIP << 8) | dwIP_Part;
            break;
        }
        if(p[0]=='.') 
        {       
            dwIP = (dwIP << 8) | dwIP_Part;                     
            dwIP_Part = 0;
           p++;
        }
        dwIP_Part = (dwIP_Part*10)+(p[0]-'0');
        p++;
    }
    return dwIP;
}

Faster Vectorized Solution:

Utilizing the x86 instruction set, a more efficient solution to the problem is presented below:

UINT32 MyGetIP(const char *str) {
    // Load and convert input
    __m128i input = _mm_lddqu_si128((const __m128i*)str);
    input = _mm_sub_epi8(input, _mm_set1_epi8('0'));

    // Generate shuffled array
    __m128i cmp = input;
    UINT32 mask = _mm_movemask_epi8(cmp);
    __m128i shuf = shuffleTable[mask];
    __m128i arr = _mm_shuffle_epi8(input, shuf);

    // Calculate coefficients
    __m128i coeffs = _mm_set_epi8(0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1);

    // Multiply and accumulate
    __m128i prod = _mm_maddubs_epi16(coeffs, arr);
    prod = _mm_hadd_epi16(prod, prod);

    // Reorder result
    __m128i imm = _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6, 4, 2, 0);
    prod = _mm_shuffle_epi8(prod, imm);

    // Extract result
    return _mm_extract_epi32(prod, 0);
}

Precalculation of ShuffleTable:

void MyInit() {
    int len[4];
    for (len[0] = 1; len[0] <= 3; len[0]++)
        for (len[1] = 1; len[1] <= 3; len[1]++)
            for (len[2] = 1; len[2] <= 3; len[2]++)
                for (len[3] = 1; len[3] <= 3; len[3]++) {
                    int slen = len[0] + len[1] + len[2] + len[3] + 4;
                    int rem = 16 - slen;
                    for (int rmask = 0; rmask < 1<<rem; rmask++) {
                        int mask = 0;
                        char shuf[16] = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
                        int pos = 0;
                        for (int i = 0; i < 4; i++) {
                            for (int j = 0; j < len[i]; j++) {
                                shuf[(3-i) * 4 + (len[i]-1-j)] = pos;
                                pos++;
                            }
                            mask ^= (1<<pos);
                            pos++;
                        }
                        mask ^= (rmask<<slen);
                        _mm_store_si128(&amp;shuffleTable[mask], _mm_loadu_si128((__m128i*)shuf));
                    }
                }
}

Evaluation:

This solution is significantly faster due to vectorization techniques, outperforming the original code by 7.8 times. It can process approximately 336 million IP addresses per second on a single core of a 3.4 GHz processor.

The above is the detailed content of How can vectorization techniques be used to accelerate the conversion of an IPv4 address from a string to an integer?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn