Home >Backend Development >PHP Tutorial >How to use PHP to crack website verification code, PHP website verification code cracking_PHP tutorial
The function of verification code is generally set up to prevent malicious registration, brute force cracking or batch posting of programs. . The so-called verification code is to generate a picture from a string of randomly generated numbers or symbols. Some interference pixels are added to the picture (to prevent OCR). The user can identify the verification code information with the naked eye, enter the form and submit it to the website for verification. The verification is successful. before you can use a function. Learning the cracking/identification technology of verification codes will not only help you understand the principles of verification codes, but also let you know how to prevent verification codes from being cracked.
The most common verification codes are as follows:
1. Four digits, a random one-digit string, the most original verification code, the verification effect is almost zero.
2. Random digital picture verification code. The characters on the picture are quite regular, some may have some random interferon added, and some have random character colors, so the verification effect is better than the previous one. People without basic knowledge of graphics and imagery cannot break it!
3. Random numbers in various image formats, random uppercase English letters, random interference pixels, and random positions.
4. Chinese characters are the latest verification code for registration. They are randomly generated, which makes it more difficult to type and affects the user experience. Therefore, it is generally used less frequently.
For the sake of simplicity, the cracking instructions are mainly for the second type. Let’s first take a look at the pictures of this verification code that are commonly seen on the Internet:
Verification code recognition is generally divided into the following steps:
1. Take out the font Identify the verification code. After all, it is not a professional OCR recognition. Moreover, since the verification codes of each website are different, the most common method is to establish the characteristics of this verification code. code library. When removing the fonts, we need to download a few more pictures so that all the characters are included in these pictures. The letters we have here only have pictures, so we only need to collect pictures including 0-9.
2. Binarization Binarization is to use a number to represent each pixel on the verification number on the picture as 1, and other parts as 0. In this way, each digital font can be calculated, recorded, and used as a key.
3. Calculate features Binarize the image to be recognized to obtain the image features.
4. Control sample Compare the image feature code in step 3 with the font pattern of the verification code to get the numbers on the verification image.
Using the current method, the recognition of verification codes can basically be 100%.
Through the above steps, you may have said that you have not discovered how to remove interferon! In fact, the method to remove interferon is very simple. An important feature of interferon is that it cannot affect the display effect of the verification code, so when making interferon, its RGB may be lower or higher than a certain value, such as in the example I gave In the picture, the RGB values of interferon will not exceed 125, so we can easily remove the interferon.
The simple verification code only consists of numbers and letters, with a unified format and a fixed position each time it appears. Let’s continue to study the recognition of verification codes in depth. The goals that need to be identified this time are: the verification code is composed of characters and numbers, the verification code is rotated (maybe both left and right), the position is not fixed, there is adhesion between characters, and verification Code has stronger interferon.
Let’s take the following picture as an example to explain.
Step one: binarization. The part of the verification code is represented by 1, and the background part is represented by 0. The identification method is very simple. We print out the RGB of the entire picture of the verification code, and then analyze its rules. Through the RGB code, we can easily distinguish The R value of the above picture is greater than 120, and the G and B values are less than 80, so according to this rule we can easily binarize the above picture.
Let’s take a look at the third verification code picture above
At first glance, it feels very complicated. The background color of the verification code picture is different every time, and it is not a single color. The color of each verification code number is also different every time. It seems difficult to binarize, but in fact it is easy to find when we print out its RGB value. No matter how the color of the verification number changes, the RGB value of the number always has a value less than 125, so the following judgment is made: $rgbarray['red'] < 125 || $rgbarray['green']<125|| $rgbarray[ 'blue'] < 125 We can easily tell where the numbers are and where the background is.
The reason why we can find these rules is that when making the interferon of the verification code, in order for the interferon not to affect the display effect of the numbers, the RGB and digital RGB of the interferon must be independent of each other and not interfere with each other. As long as we understand this rule, we can easily achieve binarization.
The 120, 80, 125 and other thresholds we found may be different from the actual RGB. Therefore, sometimes after binarization, 1 will appear in some places. For numbers displayed at fixed positions on the verification code, there is no such interference. Too meaningful. But for pictures with uncertain positions of verification codes, it is likely to cause interference when we cut characters. Therefore, denoising is required after binarization.
Step 2: Denoising. The principle of denoising is very simple, which is to remove isolated effective values. If the noise is relatively high and the required efficiency is relatively high, there is a lot of work to be done. Fortunately, we don't require so much depth here. We can use the simplest method. If a point is 1, then determine whether the number in the 8 directions of the point is 1. If it is not 1, then If it is considered a dry point, just set it to 1.
As shown in the picture above, we use this method to easily find that 1 in the red box is the dry point, and just set it to 1 directly. We use a trick when judging. Sometimes the noise point may be two consecutive 1's, so we calculate the sum of the values in the 8 directions of this point, and finally we judge whether their sum is less than a specific threshold.
Step 3: Cut the characters. There are many ways to cut characters, here is the simplest one, first cut into characters vertically, and then remove excess 0000 in the horizontal direction, as shown below
The first step is to cut the red line part, and the second step is to cut the blue line part, so that you can get independent characters. But like the following situation
The above method will cut the dw character into one character, which is wrong cutting, so here we involve the cutting of glue characters.
Step 4: Cutting the glued characters. When making a verification code, the adhesion of regular characters is easy to separate. If the characters themselves are scaled, the deformation will be difficult to handle. After analysis, we can find that the above character adhesion is a very simple way, just regular characters. Adhesion, so we also use a very simple way to deal with this situation. After completing the segmentation operation, we cannot immediately determine that the segmented part is a character. We need to verify it. The key factor of verification is whether the width of the cut character is greater than the threshold. The criterion for selecting this threshold is that no matter how a character is rotated, The deformation will not be greater than this threshold, so if the block we cut is greater than this threshold, it can be considered to be a glued character; if it is greater than the sum of the two thresholds, it is considered to be three characters glued, and so on. After knowing this rule, cutting the sticky characters is very simple. If we find that it is a block of glued characters, we can just divide the block into two or more new blocks. Of course, in order to better restore the characters, I generally use 1 and -1 to appropriately supplement the character block.
Step 5: Match characters. There are many ways to create signatures for rotated characters, so we won’t do an in-depth study here. The simplest way I use here is to build a matching library for all situations of all characters, so I added a study operation to the code I provided. The purpose is to first manually identify the verification code of the picture, and then use the study method to write Enter the feature code library. The more image data is written in this way, the more accurate rows can be verified and identified.
After the above steps, we can basically identify most of the verification codes on the Internet today. Here we are using the simplest method without using any OCR knowledge.
Additional suggestions for creating verification codes:
For the program to identify verification codes, the most difficult part is the cutting of verification characters and the establishment of feature codes. When many domestic programmers only make verification codes, they always like to add a lot of interferon and interference lines to the verification code. , not to mention the impact, it still cannot achieve very good results; therefore, if you want to make your verification code difficult to recognize, just do the following two points
1. Character adhesion, preferably all characters have adhesion parts;
2. Do not use specification characters. Use different scaling or rotation ratios for each part of the verification code.
As long as these two points are achieved, or the deformation of these two points is achieved, it will be difficult for the recognition program to identify.
The above is the entire content of this article: using PHP to crack the website verification code, I hope it will be helpful to everyone's learning.