Home >Backend Development >Python Tutorial >Python entry-level identification verification code

Python entry-level identification verification code

迷茫
迷茫Original
2017-03-25 17:07:041734browse

Preliminary situation: The content mentioned in this article was done by the blogger during the last summer vacation. I never settled down enough to write down my thoughts on paper. Fortunately, I have more free time during this holiday, so I thought I could I can write as much as I want, so this article is here.

Verification code? Can I crack it too?

I won’t say much about the introduction of verification codes. Various verification codes appear from time to time in people’s lives. As a student of Northeastern University, the blogger has the most daily contact with academic affairs. Got the system verification code.
Dongda’s verification code has been complained by students. It is too difficult to enter. It is not only case-sensitive, but sometimes you have entered it correctly, but an error message appears. At this time, prohibits your left-click copying Maybe it's time to pop up.
(However, the Academic Affairs Office changed the content of the verification code in the 2016-17 academic year to make it more convenient for humans to operate.)

It can be seen that the verification code of the Academic Affairs Office is very regular, and the size of each letter and number The position, shape, etc. are all fixed, which is suitable for beginners with no foundation to identify verification codes.

Identification method

Simulated login has complicated steps. Here, regardless of other operations, we are only responsible for returning an answer string based on an input verification code image.

We know that the verification code will make the picture colorful in order to create interference, and we first need to remove these interferences. This step requires continuous experimentation, enhancing the color of the picture, increasing the contrast, etc. can help.

After various manipulations of pictures, I finally found a more perfect solution for removing interference. It can be seen that after removing the interference, under optimal circumstances, we will get a very pure black and white character picture. There are four characters in a picture. It is impossible to recognize all four characters at once. The picture needs to be cropped so that each small picture has only one character, and then each picture is recognized separately.

The next step is to recognize the text. We first convert the obtained small image into a matrix represented by 01, each matrix represents a character.
For example, the matrix of the number six

num_6=[
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,0,
0,0,0,0,1,1,1,0,0,0,0,0,0,
0,0,0,1,1,1,0,0,0,0,0,0,0,
0,0,0,1,1,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,0,0,
0,1,1,1,1,1,1,1,0,0,0,0,0,
0,1,1,1,1,1,1,1,1,0,0,0,0,
0,1,1,0,0,0,0,1,1,1,0,0,0,
0,1,1,0,0,0,0,0,1,1,0,0,0,
0,1,1,0,0,0,0,0,1,1,0,0,0,
0,1,1,1,0,0,0,1,1,1,0,0,0,
0,0,1,1,1,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
]

Looking at it from a distance, you can still distinguish it if you squint.
Because the verification code of the Academic Affairs Office of Dongda University is very regular, and the position of each number is fixed, so there is no need to involve any machine learning algorithm. It is just a simple matrix comparison. Just find the matrix with the highest similarity among all the implemented matrices. There are various comparison methods here. Anyway, as long as the data is simple and can be correctly identified.

At this point, our verification code identification work is over.

Summary

The verification code recognition carried out this time mainly uses python's PIL for image manipulation. For all the codes to simulate login and automatically fill in the verification code, please see

xfangfang's Github

The above is the detailed content of Python entry-level identification verification code. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:How crawlers workNext article:How crawlers work