search
HomeBackend DevelopmentPHP ProblemHow to implement verification code recognition in php

php method to realize verification code recognition: first binarize the image and save the value into a two-dimensional array; then find the position of each number through a loop; then calculate the position of the number in the two-dimensional array position in the array, and concatenate the numbers; finally, compare and identify the string with the string of each font.

How to implement verification code recognition in php

Recommended: "PHP Video Tutorial"

But the introduction in the original text is relatively simple and does not mention to the specific implementation process of the algorithm. The detailed process is reproduced from:

http://www.poboke.com/study/php-verification-code-identification-primary.html

So this article is based on a practical Example to demonstrate the process of PHP identifying the verification code and submitting the verification code to the server for verification.

Part One: Identification of Verification Codes

Recently researched some breakthroughs in verification code knowledge and recorded them. On the one hand, it is a summary of the knowledge learned in the past few days to help myself understand; on the other hand, I hope it will be helpful to technical students who are studying this aspect; on the other hand, I also hope to attract the attention of website administrators and take more into consideration when providing verification codes. Since I have just come into contact with this aspect of knowledge, my understanding is relatively simple, so mistakes are inevitable. Feel free to comment.

The role of the verification code: effectively prevent a hacker from making continuous login attempts to a specific registered user using a specific program to brute force. In fact, modern verification codes generally prevent machines from registering in batches and preventing machines from posting replies in batches. Currently, many websites use verification code technology to prevent users from using robots to automatically register, log in, and spam.

The so-called verification code is to generate a picture from a string of randomly generated numbers or symbols. Some interference pixels are added to the picture (to prevent OCR). The user can visually identify the verification code information and enter it into the form. Submit website verification, and a certain function can only be used after successful verification.

Our most common verification code:
1. Four digits, a random one-digit string, the most original verification code, and the verification effect is almost zero.
2. Random digital picture verification code. The characters on the picture are quite regular, some may have some random interferons added, and some have random character colors, so the verification effect is better than the previous one. People without basic knowledge of graphics and imagery cannot break it!
3. Random numbers in various image formats, random uppercase English letters, random interference pixels, and random positions.
4. Chinese characters are the latest verification code for registration. They are randomly generated, which makes it more difficult to type and affects the user experience. Therefore, it is generally used less often.

For the sake of simplicity, the main object of our explanation this time is the first type. Let’s first look at several common verification code pictures on the Internet.
These four styles can basically represent the types of verification codes mentioned in 2. Initially, it seems that the first picture is the easiest to crack, the second is the second, the third is more difficult, and the fourth is the easiest to crack. The most difficult one.
What’s the real situation? In fact, these three types of images are equally difficult to crack.

The first picture is the easiest. The background and numbers of the picture use the same color, the characters are regular and the characters are in the same position. This article uses this type of verification code as an example. Students can create other pictures by themselves.
The second picture seems not easy. In fact, if you study it carefully, you will find its rules. No matter how the background color and interferon change, the verification characters are regular and the same color, so it is very easy to eliminate interferon, as long as it is all non-character pigments. Just exclude it.
The third picture seems to be more complicated. In addition to the background color and interferon changing as mentioned above, the color of the verification characters is also changing, and the colors of each character are also different.
In the fourth picture, in addition to the features mentioned in the third picture, two straight lines of interference rate are added to the text. It seems difficult but is actually easy to remove.

The following uses Wanwang’s “General URL Query” to illustrate the verification code identification process.
Open Wanwang: http://www.net.cn, there is a "General URL Query" in the sidebar on the right side of the website:

It can be seen that this is the first A kind of verification code. In order for the human eye to recognize the numbers, the color difference between the digital color and the background color of the verification code picture is relatively large, so its RBG value is also very different. It can be distinguished by judging the RGB value of each pixel. Numbers and background.

Verification code identification is generally divided into the following steps:

1. Take out the font
Identification of the verification code, after all, I am not a professional OCR recognition, and since the verification codes of each website are different, the most common method is to build a signature library of this verification code. When removing the fonts, we need to download a few more pictures so that these pictures include all characters. The pictures here only have numbers, so we only need to collect pictures of numbers including 0-9.

1. Refresh the verification code several times and save the verification code pictures. Collect all pictures from 0-9.

2. Open the picture with a picture processing software. I use Fireworks. Hold down ctrl 8 to enlarge the view of the picture 8 times, so that you can clearly observe the picture. every pixel.

It can be found that the width of each number is 6px, the height is 10px, the interval between numbers is 4px, the first number is offset by 2px on the left, and the top is offset by 0px. These numbers will be used later.

3. Cut out each number and save it as a picture, the size is 6*10.

2. Binarization of the picture
Binarization is to represent each pixel on the verification number on the picture with the number 1, and other parts with 0 means. Binarize the image to be recognized and save the data into a two-dimensional array to obtain the image feature array.

1. First, distinguish the numbers from the background color and interference color, and use the screen color picker to observe the color pattern.

We can draw a conclusion: the R, G, and B values ​​of the background color are all greater than 200, while one of the R, G, and B values ​​of the digital color may be less than 200. Therefore it can be easily distinguished.

2. The following PHP code is just to demonstrate the two-dimensional array. In order to visually see the numbers, 1 and 0 are changed to 0 and -:



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21


echo '


'
;

 

getHec("v1.jpg");

 

function getHec($imagePath) {

    $res = imagecreatefromjpeg($imagePath);

    $size = getimagesize($imagePath);

    

    for ($i = 0; $i < $size[1]; $i) {

        for ($j = 0; $j < $size[0]; $j) {

            $rgb = imagecolorat($res, $j, $i);

            $rgbarray = imagecolorsforindex($res, $rgb);

            if ($rgbarray['red'] < 200 || $rgbarray['green']<200 || $rgbarray['blue'] < 200) {

                echo "0";

             }else{

                echo " -";

            }

        }

        echo "
"
;

    }

}

The results are shown in the figure below:

If the background color of the picture is more complex, the processing method is the same. You can always find the critical value to distinguish. You have to observe it yourself.

3. Binarization of digital fonts
Calculate the binary data of each digital font, record these data, and use them as keys.

1. Binarize the digital font image from 0-9, take out the color of each pixel of the image one by one, then obtain the R, G, and B values ​​of each pixel, and then make a judgment. The code is as follows :



#1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21


for($i=0;$i<10;$i ){

    echo"'$i'=>'";

    echogetHec("$i.jpg")."',
"
;

}

 

functiongetHec($imagePath){

    $res=imagecreatefromjpeg($imagePath);

    $size=getimagesize($imagePath);

    

    for($i=0;$i<$size[1]; $i){

        for($j=0;$j<$size[0]; $j){

            $rgb=imagecolorat($res,$j,$i);

            $rgbarray=imagecolorsforindex($res,$rgb);

            if($rgbarray['red']<200||$rgbarray['green']<200||$rgbarray['blue']<200){

                echo#"1";

#             }else{

                echo" 0";

            }

        }

    }

}

Output result:



##1

2

3

4

5

6

7

8

9

10


'0'=>'011110100001100001100001100001100001100001100001100001011110',

'1'=>'001000111000001000001000001000001000001000001000001000111110',

'2'=>'011110100001100001000001000010000100001000010000110011111111',

'3'=>'011110100001100001000010001100000010000001100001100001011110',

'4'=>'000100000100001100010100100100100100111111000100001100001111',

'5'=>'111111100000100000101110110001000001000001100001100001011110',

'6'=>'001110010001100000100000101110110001100001100001100001011110',

'7'=>'111111100010100010000100000100001000001000001000001000001000',

'8'=>'011110100001100001100001011110010010100001100001100001011110',

'9'=>' 011100100010100001100001100011011101000001000001100010011100',

##4. Control the sample to compare the image features in step 2 Compare the code with the font pattern of the verification code in step 3 to get the numbers on the verification picture.

Algorithm process (see attachment for code):

1. Save the binarized value of the image into a two-dimensional array.
2. Through looping, find the position of each number, using the width, height, spacing, left offset, and top offset of the previously obtained number.
For example: left offset of i-th number = (number width interval) * i left offset. (w h A string similar to a numeric glyph.
4. Compare the string with the string of each font to find the similarity. Take the number corresponding to the highest similarity, or you can conclude that it is a certain number when the similarity reaches more than 95%.
5. The recognition results are as follows:


Using the current method, the recognition of the verification code can basically be 100%.

Through the above steps, you may have said that you have not discovered how to remove interferon! In fact, the method to remove interferon is very simple. An important feature of interferon is that it cannot affect the display effect of the verification code, so when making interferon, its RGB may be lower or higher than a certain value, such as in the example I gave In the picture, the RGB values ​​of interferon will not be less than 200, so we can easily remove interferon.


Source code download: http://yunpan.cn/cmJCkEnyGij3t Access password d2ba

The above is the detailed content of How to implement verification code recognition in php. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

php怎么查找字符串是第几位php怎么查找字符串是第几位Apr 22, 2022 pm 06:48 PM

查找方法:1、用strpos(),语法“strpos("字符串值","查找子串")+1”;2、用stripos(),语法“strpos("字符串值","查找子串")+1”。因为字符串是从0开始计数的,因此两个函数获取的位置需要进行加1处理。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.