search
HomeSystem TutorialLINUXUse Python to crawl the entire video information of station B

Use Python to crawl the entire video information of station B

Feb 19, 2024 pm 11:45 PM
linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

I think everyone is familiar with Station B. In fact, there are a lot of search results on the crawler website of Station B. However, what I read on paper is ultimately shallow, and I definitely know that I have to do it in detail, so I am here. In the end, the total amount of data crawled was 7.6 million items.

Preparation

First open station B, find a video on the homepage and click on it. For normal operation, open the developer tools. This time, the goal is to obtain video information by crawling the API provided by Station B without parsing the web page. The speed of parsing the web page is too slow and the IP address is easily blocked.

Check the JS option and F5 to refresh

Use Python to crawl the entire video information of station B

Found the api address

Use Python to crawl the entire video information of station B

Copy it, remove unnecessary content, and get https://api.bilibili.com/x/web-interface/archive/stat?aid=15906633 . Open it with a browser and you will get The following json data

Use Python to crawl the entire video information of station B

Hands-on coding

Okay, the code can be coded here. Data is obtained through continuous iteration of request. In order to make the crawler more efficient, multi-threading can be used.

Core code

Use Python to crawl the entire video information of station B

Iterative crawling

Use Python to crawl the entire video information of station B

The most important part of the entire project is about 20 lines of code, which is quite concise.

The running effect is roughly like this. The number is how many links have been crawled. In fact, the entire site information can be crawled in one or two days.

Use Python to crawl the entire video information of station B

As for how to process it after crawling, it depends on your preference. I save it as a csv file first, and then summarize and insert it into the database.

Database Table

Use Python to crawl the entire video information of station B

Since I crawled this content a few months ago, the data is actually lagging behind.

Total amount of data

Use Python to crawl the entire video information of station B

Query the top ten videos

Use Python to crawl the entire video information of station B

Check the top ten videos with the most replies

Use Python to crawl the entire video information of station B

The above is the detailed content of Use Python to crawl the entire video information of station B. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:Linux就该这么学. If there is any infringement, please contact admin@php.cn delete
Warehouse: A GUI for Effortlessly Handling Flatpak AppsWarehouse: A GUI for Effortlessly Handling Flatpak AppsMay 09, 2025 am 11:30 AM

A GUI for Effortless Flatpak Management: Introducing Warehouse Managing a growing collection of Flatpak applications can be cumbersome using only the command line. Enter Warehouse, a user-friendly graphical interface designed to streamline Flatpak a

8 Powerful Linux Commands to Identify Hard Drive Bottlenecks8 Powerful Linux Commands to Identify Hard Drive BottlenecksMay 09, 2025 am 11:03 AM

This article provides a comprehensive guide to identifying and resolving hard drive bottlenecks in Linux systems. Experienced server administrators will find this particularly useful. Slow disk operations can severely impact application performance,

4 Best QR Code Generators for Linux Users4 Best QR Code Generators for Linux UsersMay 09, 2025 am 10:27 AM

Efficient QR code generation tool under Linux system In today's digital world, QR codes have become a way to quickly and conveniently share information, simplifying data access from URLs, texts, contacts, Wi-Fi credentials, and even payment information. Linux users can use a variety of tools to create QR codes efficiently. Let's take a look at some popular QR code generators that can be used directly on Linux systems. QRencode QRencode is a lightweight command line tool for generating QR codes on Linux. It is well-received for its simplicity and efficiency and is popular with Linux users who prefer direct methods. Using QRencode, you can use the URL,

elementary OS 8: A User-Friendly Linux for macOS and Windowselementary OS 8: A User-Friendly Linux for macOS and WindowsMay 09, 2025 am 10:19 AM

Elementary OS 8 Circe: A Smooth and Stylish Linux Experience Elementary OS, a Ubuntu-based Linux distribution, has evolved from a simple theme pack into a fully-fledged, independent operating system. Known for its user-friendly interface, elegant de

40  Linux Commands for Every Machine Learning Engineer40 Linux Commands for Every Machine Learning EngineerMay 09, 2025 am 10:06 AM

Mastering Linux is crucial for any machine learning (ML) engineer. Its command-line interface offers unparalleled flexibility and control, streamlining workflows and boosting productivity. This article outlines essential Linux commands, explained fo

Arch Linux Cheat Sheet: Essential Commands for BeginnersArch Linux Cheat Sheet: Essential Commands for BeginnersMay 09, 2025 am 09:54 AM

Arch Linux: A Beginner's Command-Line Cheat Sheet Arch Linux offers unparalleled control but can feel daunting for newcomers. This cheat sheet provides essential commands to confidently manage your system. System Information & Updates These com

How to Install Scikit-learn for Machine Learning on LinuxHow to Install Scikit-learn for Machine Learning on LinuxMay 09, 2025 am 09:53 AM

This guide provides a comprehensive walkthrough of installing and using the Scikit-learn machine learning library on Linux systems. Scikit-learn (sklearn) is a powerful, open-source Python library offering a wide array of tools for various machine l

How to Install Kali Linux Tools in UbuntuHow to Install Kali Linux Tools in UbuntuMay 09, 2025 am 09:46 AM

This guide explains how to leverage Docker for accessing Kali Linux tools, a safer and more efficient alternative to outdated methods like Katoolin. Katoolin is no longer actively maintained and may cause compatibility problems on modern systems. Do

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor