search
HomeBackend DevelopmentPython TutorialCrawler|Python crawls pictures of girls from station B, motivation to learn!


In this issue, I will introduce to you how toUse Python to crawl the pictures of the ladies at Station B, hope it helps.


##1. Web page analysis

##Directly open station B (bilibili) and search for' Miss Sister':

Crawler|Python crawls pictures of girls from station B, motivation to learn!
There are a total of 5 pages of content, Take page 2 as an example, F12 to open the web page source code:

Crawler|Python crawls pictures of girls from station B, motivation to learn!

##Search for the first one title, we can find the corresponding XHR request. After careful analysis, we find that all the data exists in a json format data set, and our goal is resultList.

Check the Headers as follows:

Crawler|Python crawls pictures of girls from station B, motivation to learn!


##This is a get request. The number of entries in the request includes page and keywordTwo entries, corresponding to the requested page number and keyword respectively.

Check a few more pages to find the pattern:

# 第一页
'https://api.bilibili.com/x/web-interface/search/all/v2?context=&page=1&order=totalrank&keyword=%E5%B0%8F%E5%A7%90%E5%A7%90&duration=0&tids_2=&from_source=&from_spmid=333.337&platform=pc&__refresh__=true&_extra=&tids=0&highlight=1&single_column=0'
# 第二页
'https://api.bilibili.com/x/web-interface/search/type?context=&page=2&order=totalrank&keyword=%E5%B0%8F%E5%A7%90%E5%A7%90&duration=0&tids_2=&from_source=&from_spmid=333.337&platform=pc&__refresh__=true&_extra=&search_type=video&tids=0&highlight=1&single_column=0'
# 第三页
'https://api.bilibili.com/x/web-interface/search/type?context=&page=3&order=totalrank&keyword=%E5%B0%8F%E5%A7%90%E5%A7%90&duration=0&tids_2=&from_source=&from_spmid=333.337&platform=pc&__refresh__=true&_extra=&search_type=video&tids=0&highlight=1&single_column=0'

You can see that except for the first page, the other page URLsonly have different page parameters, then let’s try it Also use the URLs of other pages to request the first page, and you will find that the desired results can also be obtained (try it yourself).

in conclusion: All page URLs differ only in the page parameter, and the rest are the same.


2. 数据爬取

2.1 导入模块
# 导包
import re
import time
import json
import random
import requests
from fake_useragent import UserAgent

2.2 获取页面信息

根据分析的url请求数据:
# 获取页面信息
def get_datas(url,headers):
    r = requests.get(url, headers=headers)
    r.raise_for_status()
    r.encoding = chardet.detect(r.content)['encoding'] 
    datas = json.loads(r.text)
    return datas
2.3 获取具体图片信息
# 获取图片链接信息
def get_hrefs(datas):
    titles,hrefs = [],[]
    for data in datas['data']['result']:
        # 标题
        title = data['title']
        # 时长
        duration = data['duration']
        # 播放量
        video_review =data['video_review']
        # 发布时间
        date_rls = data['pubdate']
        pubdate = time.strftime('%Y-%m-%d %H:%M', time.localtime(date_rls))
        # 作者
        author = data['author']
        # 图片链接
        link_pic = data['pic']
        href_pic = 'https:' + link_pic
        
        titles.append(title)
        hrefs.append(href_pic)
        
        return titles, hrefs

代码解析了视频标题,时长,播放量,发布时间,作者,图片链接等参数,这里我们只取标题图片链接,其他参数可根据需要自行增,删。

2.4 保存图片
# 保存图片
def download_jpg(titles, hrefs):
    path = "D:/B站小姐姐/"
    if not os.path.exists(path):
        os.mkdir(path)
    for i in range(len(hrefs)):
        title_t = titles[i].replace('/','').replace(',','').replace('?','')
     title_t = title_t.replace(' ','').replace('|','').replace('。','')
        filename = '{}{}.jpg'.format(path,title_t)
        with open(filename, 'wb') as f:
            req = requests.get(url=hrefs[i], headers=headers)
            f.write(req.content)
            time.sleep(random.uniform(1.5,3.4))
这里我们用标题作为图片名称进行存储,需要注意文件名称不能包含特殊符号,这里过滤了” / ,。|“等4种(每天视频有增删,可能有出入,需要自己调整,也可以不使用标题做名称)。


3. Result

##Part of the picture:

Crawler|Python crawls pictures of girls from station B, motivation to learn!


##

The above is the detailed content of Crawler|Python crawls pictures of girls from station B, motivation to learn!. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:Python当打之年. If there is any infringement, please contact admin@php.cn delete
Python and Time: Making the Most of Your Study TimePython and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MorePython: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C  : Applications and Use Cases ComparedPython vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachThe 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsPython: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.