Home  >  Article  >  Backend Development  >  Crawler | Batch download of HD wallpapers (source code + tools included)

Crawler | Batch download of HD wallpapers (source code + tools included)

Python当打之年
Python当打之年forward
2023-08-10 15:46:011458browse


#Unsplash is a free high-quality photo website. They are all real photography photos. The photo resolution is also very large. It is very good for designer friends. The material is also very useful for some illustration copywriting friends, and it also works well as wallpaper. The corresponding function code has been encapsulated into an exe tool. I hope it will be helpful to you. The code tool acquisition method is attached at the end of the article.


1. Import module

1.1 Import module

##Code:

Crawler | Batch download of HD wallpapers (source code + tools included)

#Let’s take a look at the manual download process first. Note that you do not right-click the image to save as. The image obtained by right-clicking the save method is compressed at a certain ratio, and the clarity will be reduced a lot. Take Nature as an example, click Download free and select the download path. The image size is 1.43M.

Crawler | Batch download of HD wallpapers (source code + tools included)
##Next,
analyze specific web pages
:
First of all, we observed that there is a page number selection option at the bottom of the web page. We tried to pull down the web page slider and found that the
pictures were dynamically loaded
. That is to say, when we pull down the web page, subsequent pictures will be displayed one after another.

After several operations, I found that when the page is pulled down, the web page will issue the following requests, click on one of them, You can see the total number of pictures
: 10000, the total number of pages: 500

.

Let’s take a look at a few URLs:

Crawler | Batch download of HD wallpapers (source code + tools included)

The above links are only page parameters are different, and they are increasing in sequence, which is relatively friendly. Just traverse them in sequence when requesting.

The page number problem has been solved. Next, analyze the link of each picture:

Crawler | Batch download of HD wallpapers (source code + tools included)

We see that the result list length is exactly 20, With the same per_page value in the request, there is no doubt that the link to each image we are looking for is here.
Analyzing web pages is often time-consuming, but overall it goes smoothly. Now we officially crawl the images.


#2. Crawl images

##2.1 Import module
import time
import random
import json
import requests
from fake_useragent import UserAgent
  • ##time: Timing
  • random: Generate random numbers

  • json: Process json format data

  • requests:Web page requests

  • fake_useragent:代理

2.2 获取图片 
模拟代理,以网页的身份访问服务器,避免请求被服务器判定为机器爬虫而不响应请求
ua = UserAgent(verify_ssl=False)
headers = {'User-Agent': ua.random}
根据响应,获取所有图片链接:
def getpicurls(i,headers):
    picurls = []
    url = 'https://unsplash.com/napi/search/photos?query=nature&per_page=20&page={}&xp=feedback-loop-v2%3Aexperiment'.format(i)
    r = requests.get(url, headers=headers, timeout=5)
    time.sleep(random.uniform(3.1, 4.5))
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    allinfo = json.loads(r.text)
    results = allinfo['results']
    for result in results:
        href = result['urls']['full']
        picurls.append(href)
    return picurls
2.3 保存图片 

保存图片文件:
def getpic(count,url):
    r = requests.get(url, headers=headers, timeout=5)
    with open('pictures/{}.jpg'.format(count), 'wb') as f:
        f.write(r.content)
效果:

Crawler | Batch download of HD wallpapers (source code + tools included)


3. EXE爬取

exe工具运行结果:

Crawler | Batch download of HD wallpapers (source code + tools included)

Note:
  • Try not to crawl frequently to avoid affecting the network order!

  • The picture is a high-definition picture from the external network. The crawling speed depends on the network and is generally not too fast.

  • You can build a proxy pool to crawl faster.

The above is the detailed content of Crawler | Batch download of HD wallpapers (source code + tools included). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Python当打之年. If there is any infringement, please contact admin@php.cn delete