Home >Web Front-end >PS Tutorial >PhotoShop algorithm principle analysis series - stylization - finding edges.

PhotoShop algorithm principle analysis series - stylization - finding edges.

高洛峰
高洛峰Original
2017-02-21 09:14:272197browse

The reason why I don’t write titles like Series 1 and Series 2 is because I don’t know how long I can persist. I know that I am not very gifted in terms of the ability to express things and the richness of language. And a piece of code requires me to carefully describe the process from basic principles - "preliminary implementation -" optimization speed and other processes in words, which is probably not an easy task.

I cannot say that some of the algorithms in Photoshop that I have mastered are 100% correct, but judging from the execution effect, there is definitely no problem in the general direction.

At present, I may know nearly 100 PS algorithms from other people’s articles, open source codes and my own thinking. If time allows and my own patience allows, I will slowly sort out these things. Although in the eyes of many people, these algorithms have no research value. After all, they have been commercialized. It makes sense, let me just use it as a way of self-appreciation and self-satisfaction.

Today, we talk about the edge search algorithm. Maybe after I explained the principle, many people wouldn't read it, but there are a few people who have studied it carefully.

Post a rendering first:

PhotoShop算法原理解析系列 -  风格化---》查找边缘。

## Principle: The results of common Sobel edge operators can be inverted.

In order to attract you to continue reading, I will first give the execution speed of my code: For digital pictures of 3000*4000*3, processing time300ms.

What is Sobel? I copied a few pictures from Baidu and modified the address:

PhotoShop算法原理解析系列 -  风格化---》查找边缘。 I won’t explain too much about the above two formulas. You only need to know that A is the input image and G is the output image of A. The last step is to do: G=255-G , which is the edge search algorithm.

There is a problem with edge-finding algorithms. How to deal with the pixels at the physical edge

of the image. In daily processing codes, many people just ignore the four Edge pixels, as a professional image processing software, violate the most basic principles. Separate code processing for edges will bring redundant and cumbersome problems to the encoding. The simplest and most efficient way to solve the problem is to use sentinel boundaries.

Those who have written many special effects algorithms should know that except for the algorithm that processes a single pixel, there is no need to make a backup of the original image (not necessarily a global backup), those Algorithms that require domain information modify a pixel in the previous step of the algorithm, and the current step of the algorithm requires an unmodified pixel value. Therefore, this algorithm generally clones the original image before starting. During calculation, it needs Domain information is read from the cloned data. If the cloning process is not a complete clone, but extends the appropriate boundary and then clones it, it is possible to solve the above boundary processing problem.

For example, for the picture below, which is 19×14 pixels in size, our backup picture is expanded by one pixel in the top, bottom, left and right, and is filled with edge values ​​to become 21*16 Size:

In this way, when calculating the 3*3 area pixels of the original image, sampling from the corresponding points of the expanded clone image will not cause the problem of not being within the image range, and there will be a lot less judgment in the encoding. Readability is also enhanced.

In terms of calculation speed, it is noted that there is a square root operation in the above calculation formula G. This is a time-consuming process. Due to the particularity of the image data, it must be an integer. You can optimize the speed by using a lookup table, which requires considering the establishment of the table.

For the specific issues of this article, we will discuss it in two steps. First: Create a lookup table for all possible situations under the root number. Take a look at the calculation formulas of GX and GY, and consider what the maximum value of the sum of squares of the two is. You may want to think about it for a while. Second: Just create a lookup table within the range of 0^2 to 255^2, and then ensure that the number under the root sign is not greater than 255^2. Why this can be done is because the maximum value of image data is 255. If the number under the root sign is greater than 255^2, it still needs to be adjusted to 255 after finding the square root value. Therefore, the latter should be chosen in this algorithm.

private void CmdFindEdgesArray_Click(object sender, EventArgs e)
{    int X, Y;    int Width, Height, Stride, StrideC, HeightC;    int Speed, SpeedOne, SpeedTwo, SpeedThree;    int BlueOne, BlueTwo, GreenOne, GreenTwo, RedOne, RedTwo;    int PowerRed, PowerGreen, PowerBlue;
    Bitmap Bmp = (Bitmap)Pic.Image;    if (Bmp.PixelFormat != PixelFormat.Format24bppRgb) throw new Exception("不支持的图像格式.");    byte[] SqrValue = new byte[65026];    for (Y = 0; Y < 65026; Y++) SqrValue[Y] = (byte)(255 - (int)Math.Sqrt(Y));      // 计算查找表,注意已经砸查找表里进行了反色
    Width = Bmp.Width; Height = Bmp.Height; Stride = (int)((Bmp.Width * 3 + 3) & 0XFFFFFFFC);
    StrideC = (Width + 2) * 3; HeightC = Height + 2;                                 // 宽度和高度都扩展2个像素

    byte[] ImageData = new byte[Stride * Height];                                    // 用于保存图像数据,(处理前后的都为他)
    byte[] ImageDataC = new byte[StrideC * HeightC];                                // 用于保存扩展后的图像数据

    fixed (byte* Scan0 = &ImageData[0])
    {
        BitmapData BmpData = new BitmapData();
        BmpData.Scan0 = (IntPtr)Scan0;                                              //  设置为字节数组的的第一个元素在内存中的地址
        BmpData.Stride = Stride;
        Bmp.LockBits(new Rectangle(0, 0, Bmp.Width, Bmp.Height), ImageLockMode.ReadWrite | ImageLockMode.UserInputBuffer, PixelFormat.Format24bppRgb, BmpData);

        Stopwatch Sw = new Stopwatch();                                             //  只获取计算用时        Sw.Start();        for (Y = 0; Y < Height; Y++)
        {
            System.Buffer.BlockCopy(ImageData, Stride * Y, ImageDataC, StrideC * (Y + 1), 3);        // 填充扩展图的左侧第一列像素(不包括第一个和最后一个点)
            System.Buffer.BlockCopy(ImageData, Stride * Y + (Width - 1) * 3, ImageDataC, StrideC * (Y + 1) + (Width + 1) * 3, 3);  // 填充最右侧那一列的数据
            System.Buffer.BlockCopy(ImageData, Stride * Y, ImageDataC, StrideC * (Y + 1) + 3, Width * 3);
        }
        System.Buffer.BlockCopy(ImageDataC, StrideC, ImageDataC, 0, StrideC);              //  第一行
        System.Buffer.BlockCopy(ImageDataC, (HeightC - 2) * StrideC, ImageDataC, (HeightC - 1) * StrideC, StrideC);    //  最后一行               

        for (Y = 0; Y < Height; Y++)
        {
            Speed = Y * Stride;
            SpeedOne = StrideC * Y;            for (X = 0; X < Width; X++)
            {
                SpeedTwo = SpeedOne + StrideC;          //  尽量减少计算
                SpeedThree = SpeedTwo + StrideC;        //  下面的就是严格的按照Sobel算字进行计算,代码中的*2一般会优化为移位或者两个Add指令的,如果你不放心,当然可以直接改成移位
                BlueOne = ImageDataC[SpeedOne] + 2 * ImageDataC[SpeedTwo] + ImageDataC[SpeedThree] - ImageDataC[SpeedOne + 6] - 2 * ImageDataC[SpeedTwo + 6] - ImageDataC[SpeedThree + 6];
                GreenOne = ImageDataC[SpeedOne + 1] + 2 * ImageDataC[SpeedTwo + 1] + ImageDataC[SpeedThree + 1] - ImageDataC[SpeedOne + 7] - 2 * ImageDataC[SpeedTwo + 7] - ImageDataC[SpeedThree + 7];
                RedOne = ImageDataC[SpeedOne + 2] + 2 * ImageDataC[SpeedTwo + 2] + ImageDataC[SpeedThree + 2] - ImageDataC[SpeedOne + 8] - 2 * ImageDataC[SpeedTwo + 8] - ImageDataC[SpeedThree + 8];
                BlueTwo = ImageDataC[SpeedOne] + 2 * ImageDataC[SpeedOne + 3] + ImageDataC[SpeedOne + 6] - ImageDataC[SpeedThree] - 2 * ImageDataC[SpeedThree + 3] - ImageDataC[SpeedThree + 6];
                GreenTwo = ImageDataC[SpeedOne + 1] + 2 * ImageDataC[SpeedOne + 4] + ImageDataC[SpeedOne + 7] - ImageDataC[SpeedThree + 1] - 2 * ImageDataC[SpeedThree + 4] - ImageDataC[SpeedThree + 7];
                RedTwo = ImageDataC[SpeedOne + 2] + 2 * ImageDataC[SpeedOne + 5] + ImageDataC[SpeedOne + 8] - ImageDataC[SpeedThree + 2] - 2 * ImageDataC[SpeedThree + 5] - ImageDataC[SpeedThree + 8];

                PowerBlue = BlueOne * BlueOne + BlueTwo * BlueTwo;
                PowerGreen = GreenOne * GreenOne + GreenTwo * GreenTwo;
                PowerRed = RedOne * RedOne + RedTwo * RedTwo;                if (PowerBlue > 65025) PowerBlue = 65025;           //  处理掉溢出值
                if (PowerGreen > 65025) PowerGreen = 65025;                if (PowerRed > 65025) PowerRed = 65025;
                ImageData[Speed] = SqrValue[PowerBlue];             //  查表
                ImageData[Speed + 1] = SqrValue[PowerGreen];
                ImageData[Speed + 2] = SqrValue[PowerRed];

                Speed += 3;                                  // 跳往下一个像素
                SpeedOne += 3;
            }
        }
        Sw.Stop();        this.Text = "计算用时: " + Sw.ElapsedMilliseconds.ToString() + " ms";

        Bmp.UnlockBits(BmpData);                         //  必须先解锁,否则Invalidate失败     }
    Pic.Invalidate();
}

For the sake of simplicity, this is first implemented using a one-dimensional array in C#, and the timing part does not consider the image data Obtain and update, because the image data must have been obtained during the real image processing process.

For the above code, after compiling to Release mode, execute the compiled EXE. For a 3000*4000*3 color image, it takes about 480ms. If you are in IDE mode Run it first, and remember to uncheck the Cancel JIT optimization (hosting only) column when the module is loaded in Options--"Debugging--"General.

PhotoShop算法原理解析系列 -  风格化---》查找边缘。

When filling in the cloned image data in the above code, a new image is not created, and then the image data is filled in , but directly fills an array. Isn't the image actually just a piece of continuous memory plus a little bit of header information? The header information is already there, so just one piece of memory is enough.

The filling of clone data uses the system Buffer.BlockCopy function, which is similar to the CopyMemory we used to use and is very fast.

In order to further increase the execution speed, we first take a look at the code of the key time-consuming part of the algorithm, that is, the code inside for (X = 0; X < Width; X++) , let’s take a look at the decompiled code of a line of code:

<span style="font-size: 13px;"> BlueOne = ImageDataC[SpeedOne] + <span style="color: #800080;">2</span> * ImageDataC[SpeedTwo] + ImageDataC[SpeedThree] - ImageDataC[SpeedOne + <span style="color: #800080;">6</span>] - <span style="color: #800080;">2</span> * ImageDataC[SpeedTwo + <span style="color: #800080;">6</span>] - ImageDataC[SpeedThree + <span style="color: #800080;">6</span><span style="color: #000000;">];<br/><br/></span><span style="color: #800080;">00000302</span><span style="color: #000000;">  cmp         ebx,edi 
</span><span style="color: #800080;">00000304</span><span style="color: #000000;">  jae         0000073C             //   数组是否越界?
0000030a  movzx       eax,</span><span style="color: #0000ff;">byte</span> ptr [esi+ebx+<span style="color: #800080;">8</span><span style="color: #000000;">]    //  将ImageDataC[SpeedOne]中的数据传送的eax寄存器
0000030f  mov         dword ptr [ebp</span>-<span style="color: #000000;">80h],eax 
</span><span style="color: #800080;">00000312</span>  mov         edx,dword ptr [ebp-<span style="color: #000000;">2Ch] 
</span><span style="color: #800080;">00000315</span><span style="color: #000000;">  cmp         edx,edi 
</span><span style="color: #800080;">00000317</span><span style="color: #000000;">  jae         0000073C            //     数组是否越界?           
0000031d  movzx       edx,</span><span style="color: #0000ff;">byte</span> ptr [esi+edx+<span style="color: #800080;">8</span><span style="color: #000000;">]   //   将ImageDataC[SpeedTwo]中的数据传送到edx寄存器</span><span style="color: #800080;">00000322</span><span style="color: #000000;">  add         edx,edx             //    计算2*ImageDataC[SpeedTwo]    </span><span style="color: #800080;">00000324</span><span style="color: #000000;">  add         eax,edx             //    计算ImageDataC[SpeedOne]+2*ImageDataC[SpeedTwo],并保存在eax寄存器中           </span><span style="color: #800080;">00000326</span><span style="color: #000000;">  cmp         ecx,edi 
</span><span style="color: #800080;">00000328</span><span style="color: #000000;">  jae         0000073C 
0000032e  movzx       edx,</span><span style="color: #0000ff;">byte</span> ptr [esi+ecx+<span style="color: #800080;">8</span><span style="color: #000000;">]   //    将ImageDataC[SpeedThree]中的数据传送到edx寄存器</span><span style="color: #800080;">00000333</span>  mov         dword ptr [ebp+<span style="color: #000000;">FFFFFF78h],edx 
</span><span style="color: #800080;">00000339</span><span style="color: #000000;">  add         eax,edx 
0000033b  lea         edx,[ebx</span>+<span style="color: #800080;">6</span><span style="color: #000000;">] 
0000033e  cmp         edx,edi 
</span><span style="color: #800080;">00000340</span><span style="color: #000000;">  jae         0000073C 
</span><span style="color: #800080;">00000346</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+edx+<span style="color: #800080;">8</span><span style="color: #000000;">] 
0000034b  mov         dword ptr [ebp</span>+<span style="color: #000000;">FFFFFF7Ch],edx 
</span><span style="color: #800080;">00000351</span><span style="color: #000000;">  sub         eax,edx 
</span><span style="color: #800080;">00000353</span>  mov         edx,dword ptr [ebp-<span style="color: #000000;">2Ch] 
</span><span style="color: #800080;">00000356</span>  add         edx,<span style="color: #800080;">6</span> <span style="color: #800080;">00000359</span><span style="color: #000000;">  cmp         edx,edi 
0000035b  jae         0000073C 
</span><span style="color: #800080;">00000361</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+edx+<span style="color: #800080;">8</span><span style="color: #000000;">] 
</span><span style="color: #800080;">00000366</span><span style="color: #000000;">  add         edx,edx 
</span><span style="color: #800080;">00000368</span><span style="color: #000000;">  sub         eax,edx 
0000036a  lea         edx,[ecx</span>+<span style="color: #800080;">6</span><span style="color: #000000;">] 
0000036d  cmp         edx,edi 
0000036f  jae         0000073C 
</span><span style="color: #800080;">00000375</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+edx+<span style="color: #800080;">8</span><span style="color: #000000;">] 
0000037a  mov         dword ptr [ebp</span>+<span style="color: #000000;">FFFFFF74h],edx 
</span><span style="color: #800080;">00000380</span><span style="color: #000000;">  sub         eax,edx 
</span><span style="color: #800080;">00000382</span>  mov         dword ptr [ebp-30h],eax </span>

I only commented a little on the above assembly code, the most important of which is 0000073c Label, we followed the cash back to call another function:

# This 0000073c call 685172a4

## 标 We see that we are getting each array element Before, a cmp and jae instruction must be executed. From the analysis, I think this is to do something similar to determine whether the subscript of the array is out of bounds. If we can ensure that our algorithm will not cross the boundary, this part of the code will be very useful, won't it delay me from doing my business?

To this end, I think it is necessary to directly use pointers in C# to implement the algorithm. C# has unsafe mode and pointers, so it is very convenient, and the expression of pointers can be expressed with *, You can also use [], for example, *(P+4) and P[4] have the same meaning. Then the above code can be modified into a pointer version with very few modifications.

private void CmdFindEdgesPointer_Click(object sender, EventArgs e)
    {        int X, Y;        int Width, Height, Stride, StrideC, HeightC;        int Speed, SpeedOne, SpeedTwo, SpeedThree;        int BlueOne, BlueTwo, GreenOne, GreenTwo, RedOne, RedTwo;        int PowerRed, PowerGreen, PowerBlue;
        Bitmap Bmp = (Bitmap)Pic.Image;        if (Bmp.PixelFormat != PixelFormat.Format24bppRgb) throw new Exception("不支持的图像格式.");        byte[] SqrValue = new byte[65026];        for (Y = 0; Y < 65026; Y++) SqrValue[Y] = (byte)(255 - (int)Math.Sqrt(Y));      // 计算查找表,注意已经砸查找表里进行了反色
        Width = Bmp.Width; Height = Bmp.Height; Stride = (int)((Bmp.Width * 3 + 3) & 0XFFFFFFFC);
        StrideC = (Width + 2) * 3; HeightC = Height + 2;                                 // 宽度和高度都扩展2个像素

        byte[] ImageData = new byte[Stride * Height];                                    // 用于保存图像数据,(处理前后的都为他)
        byte[] ImageDataC = new byte[StrideC * HeightC];                                 // 用于保存扩展后的图像数据

        fixed (byte* P = &ImageData[0], CP = &ImageDataC[0], LP = &SqrValue[0])
        {            byte* DataP = P, DataCP = CP, LutP = LP;
            BitmapData BmpData = new BitmapData();
            BmpData.Scan0 = (IntPtr)DataP;                                              //  设置为字节数组的的第一个元素在内存中的地址
            BmpData.Stride = Stride;
            Bmp.LockBits(new Rectangle(0, 0, Bmp.Width, Bmp.Height), ImageLockMode.ReadWrite | ImageLockMode.UserInputBuffer, PixelFormat.Format24bppRgb, BmpData);

            Stopwatch Sw = new Stopwatch();                                             //  只获取计算用时            Sw.Start();            for (Y = 0; Y < Height; Y++)
            {
                System.Buffer.BlockCopy(ImageData, Stride * Y, ImageDataC, StrideC * (Y + 1), 3);        // 填充扩展图的左侧第一列像素(不包括第一个和最后一个点)
                System.Buffer.BlockCopy(ImageData, Stride * Y + (Width - 1) * 3, ImageDataC, StrideC * (Y + 1) + (Width + 1) * 3, 3);  // 填充最右侧那一列的数据
                System.Buffer.BlockCopy(ImageData, Stride * Y, ImageDataC, StrideC * (Y + 1) + 3, Width * 3);
            }
            System.Buffer.BlockCopy(ImageDataC, StrideC, ImageDataC, 0, StrideC);              //  第一行
            System.Buffer.BlockCopy(ImageDataC, (HeightC - 2) * StrideC, ImageDataC, (HeightC - 1) * StrideC, StrideC);    //  最后一行               

            for (Y = 0; Y < Height; Y++)
            {
                Speed = Y * Stride;
                SpeedOne = StrideC * Y;                for (X = 0; X < Width; X++)
                {
                    SpeedTwo = SpeedOne + StrideC;          //  尽量减少计算
                    SpeedThree = SpeedTwo + StrideC;        //  下面的就是严格的按照Sobel算字进行计算,代码中的*2一般会优化为移位或者两个Add指令的,如果你不放心,当然可以直接改成移位
                    BlueOne = DataCP[SpeedOne] + 2 * DataCP[SpeedTwo] + DataCP[SpeedThree] - DataCP[SpeedOne + 6] - 2 * DataCP[SpeedTwo + 6] - DataCP[SpeedThree + 6];
                    GreenOne = DataCP[SpeedOne + 1] + 2 * DataCP[SpeedTwo + 1] + DataCP[SpeedThree + 1] - DataCP[SpeedOne + 7] - 2 * DataCP[SpeedTwo + 7] - DataCP[SpeedThree + 7];
                    RedOne = DataCP[SpeedOne + 2] + 2 * DataCP[SpeedTwo + 2] + DataCP[SpeedThree + 2] - DataCP[SpeedOne + 8] - 2 * DataCP[SpeedTwo + 8] - DataCP[SpeedThree + 8];
                    BlueTwo = DataCP[SpeedOne] + 2 * DataCP[SpeedOne + 3] + DataCP[SpeedOne + 6] - DataCP[SpeedThree] - 2 * DataCP[SpeedThree + 3] - DataCP[SpeedThree + 6];
                    GreenTwo = DataCP[SpeedOne + 1] + 2 * DataCP[SpeedOne + 4] + DataCP[SpeedOne + 7] - DataCP[SpeedThree + 1] - 2 * DataCP[SpeedThree + 4] - DataCP[SpeedThree + 7];
                    RedTwo = DataCP[SpeedOne + 2] + 2 * DataCP[SpeedOne + 5] + DataCP[SpeedOne + 8] - DataCP[SpeedThree + 2] - 2 * DataCP[SpeedThree + 5] - DataCP[SpeedThree + 8];

                    PowerBlue = BlueOne * BlueOne + BlueTwo * BlueTwo;
                    PowerGreen = GreenOne * GreenOne + GreenTwo * GreenTwo;
                    PowerRed = RedOne * RedOne + RedTwo * RedTwo;                    if (PowerBlue > 65025) PowerBlue = 65025;           //  处理掉溢出值
                    if (PowerGreen > 65025) PowerGreen = 65025;                    if (PowerRed > 65025) PowerRed = 65025;

                    DataP[Speed] = LutP[PowerBlue];                     //  查表
                    DataP[Speed + 1] = LutP[PowerGreen];
                    DataP[Speed + 2] = LutP[PowerRed];

                    Speed += 3;                                         //  跳往下一个像素
                    SpeedOne += 3;
                }
            }
            Sw.Stop();            this.Text = "计算用时: " + Sw.ElapsedMilliseconds.ToString() + " ms";

            Bmp.UnlockBits(BmpData);                         //  必须先解锁,否则Invalidate失败         }
        Pic.Invalidate();
    }<p class="cnblogs_code"></p>
<p>## The same effect, the same image, the calculation takes 330ms. </p>
<p><span style="font-size: 13px;"></span> Let’s take a look at the assembly code of the same code: </p>
<p><span style="font-size: 13px;"></span></p>
<pre class="brush:php;toolbar:false"><span style="font-size: 13px;">BlueOne = DataCP[SpeedOne] + <span style="color: #800080;">2</span> * DataCP[SpeedTwo] + DataCP[SpeedThree] - DataCP[SpeedOne + <span style="color: #800080;">6</span>] - <span style="color: #800080;">2</span> * DataCP[SpeedTwo + <span style="color: #800080;">6</span>] - DataCP[SpeedThree + <span style="color: #800080;">6</span><span style="color: #000000;">];<br><br></span><span style="color: #800080;">00000318</span>  movzx       eax,<span style="color: #0000ff;">byte</span> ptr [esi+<span style="color: #000000;">edi] 
0000031c  mov         dword ptr [ebp</span>-<span style="color: #000000;">74h],eax 
0000031f  movzx       edx,</span><span style="color: #0000ff;">byte</span> ptr [esi+<span style="color: #000000;">ebx] 
</span><span style="color: #800080;">00000323</span><span style="color: #000000;">  add         edx,edx 
</span><span style="color: #800080;">00000325</span><span style="color: #000000;">  add         eax,edx 
</span><span style="color: #800080;">00000327</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+<span style="color: #000000;">ecx] 
0000032b  mov         dword ptr [ebp</span>-<span style="color: #000000;">7Ch],edx 
0000032e  add         eax,edx 
</span><span style="color: #800080;">00000330</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+edi+<span style="color: #800080;">6</span><span style="color: #000000;">] 
</span><span style="color: #800080;">00000335</span>  mov         dword ptr [ebp-<span style="color: #000000;">78h],edx 
</span><span style="color: #800080;">00000338</span><span style="color: #000000;">  sub         eax,edx 
0000033a  movzx       edx,</span><span style="color: #0000ff;">byte</span> ptr [esi+ebx+<span style="color: #800080;">6</span><span style="color: #000000;">] 
0000033f  add         edx,edx 
</span><span style="color: #800080;">00000341</span><span style="color: #000000;">  sub         eax,edx 
</span><span style="color: #800080;">00000343</span>  movzx       edx,<span style="color: #0000ff;">byte</span> ptr [esi+ecx+<span style="color: #800080;">6</span><span style="color: #000000;">] 
</span><span style="color: #800080;">00000348</span>  mov         dword ptr [ebp-<span style="color: #000000;">80h],edx 
0000034b  sub         eax,edx 
0000034d  mov         dword ptr [ebp</span>-30h],eax </span>

Produced The assembly code is concise, has clear meaning, and lacks many instructions in comparison. Of course it will be much faster.

Pay attention to this code:

  fixed (byte* P = &ImageData[0], CP = &ImageDataC[0], LP = &SqrValue[0])
        {            byte* DataP = P, DataCP = CP, LutP = LP;

If you replace it with:

  fixed (byte* DataP = &ImageData[0], DataCP = &ImageDataC[0], LutP = &SqrValue[0])
{

The speed of the code is actually slower than the pure array version. As for why, practice is king. I haven’t analyzed it. Anyway, I know this result. You can refer to an article by Tie Ge:

Chat.Net type of public is not public, fixed cannot be fixed

Of course this You can also further optimize small actions, such as movzx eax,byte ptr [esi+edi] In this sentence, esi is actually the base address of the array. If you write DataCP[SpeedOne] like this, you will have this base address + offset every time. In terms of calculation, if a pointer variable can be directly and dynamically controlled in real time so that it points directly to the requested position, one less addition will be needed. Although the optimization is not obvious, it can basically reach the 300ms time mentioned before in the question. The specific code can be found in the attachment.

Many people may not be interested in these things of mine, saying that throwing these things to the GPU is better than what you have now... I hope these friends will not attack them too much. Everyone has their own hobbies, I only like CPU.

More PhotoShop algorithm principle analysis series - Stylization - Find edges. For related articles, please pay attention to the PHP Chinese website!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn