How to identify python crawler-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to identify python crawler

silencement

May 22, 2019 pm 06:03 PM

How to identify python crawler

1. Input verification code

This kind of verification code is mainly passed by the user Enter the letters, numbers, Chinese characters, etc. in the picture for verification. As shown below

How to identify python crawler

##Solution idea: This is the simplest one, as long as you identify Then fill in the content inside the input box. This recognition technology is called OCR. Here we recommend using Python’s third-party library, tesserocr. For the verification code without any background influence, as shown in Figure 2, you can identify it directly through this library. However, for verification codes with noisy backgrounds, the direct recognition rate will be very low. When encountering this kind of situation, we need to process the image first, grayscale the image first, and then binarize it, and then recognition, so that the recognition rate will be greatly improved.

2. Sliding verification code

How to identify python crawler

##Solution: This kind of verification code is a bit more complicated, but there are corresponding methods. What we directly think of is to simulate the behavior of a person dragging the verification code, click the button, and then see the location of the gap, and finally drag the puzzle to the gap location to complete the verification.

Step one: Click the button. Then we found that the gap and the puzzle did not appear when you did not click the button, but appeared after you clicked it. This provided us with inspiration to find the location of the gap.

Step 2: Drag to the gap position. We know that the puzzle should be dragged to the gap, but if this distance is expressed in numerical values? Through the phenomenon we observed in the first step, we can find the location of the gap. Here we can compare the pixels of the two pictures and set a benchmark value. If the difference at a certain position exceeds the benchmark value, then we have found the different positions of the two pictures. Of course, we start from the right side of the puzzle. Start from the side and go from left to right, and end when you find the first different position. This position should be the left of the gap, so we can use selenium to drag to this position. Another question here is how to automatically save these two pictures? Here we can first find the label, then get its location and size, then top, bottom, left, right = location['y'], location['y'] size['height'] location['x'] size['width'] , then take a screenshot, and finally cut out the image and fill in these four positions. For specific usage, you can view the selenium documentation. Cut out a picture before clicking the button, and then cut out a picture after clicking the button. When dragging at the end, you need to simulate human behavior, first accelerating and then decelerating. Because this kind of verification code has behavioral characteristics detection, it is impossible for a person to keep a constant speed, otherwise it will be judged as a machine dragging, so it will not pass the verification.

Related learning recommendations: python tutorial

The above is the detailed content of How to identify python crawler. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Linux new version

SublimeText3 Linux latest version

Atom editor mac version download

The most popular open source editor

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7503

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers