Home >Web Front-end >JS Tutorial >Practical sharing: Use nodejs to crawl and download more than 10,000 images
This article will share with you a node practical experience to see how the author used nodejs to crawl more than 10,000 little sister wallpapers. I hope it will be helpful to everyone!
Hello, everyone, I am Xiaoma, why do I need to download so many pictures? A few days ago, I used uni-app uniCloud to deploy a wallpaper applet for free. Then I need some resources to fill the applet with content.
First initialize the project and install axios
and cheerio
npm init -y && npm i axios cheerio
axios
Used to crawl web page content, cheerio
is the jquery api on the server side, we use it to obtain the image address in the dom;
const axios = require('axios') const cheerio = require('cheerio') function getImageUrl(target_url, containerEelment) { let result_list = [] const res = await axios.get(target_url) const html = res.data const $ = cheerio.load(html) const result_list = [] $(containerEelment).each((element) => { result_list.push($(element).find('img').attr('src')) }) return result_list }
allows us to obtain the image URL in the page. Next, you need to download the image according to the url.
Method 1: Use the built-in modules 'https' and 'fs'
Usenodejs Downloading files can be done using built-in packages or third-party libraries.
The GET method is used with HTTPS to get the file to download. createWriteStream()
is a method used to create a writable stream. It only receives one parameter, which is the location where the file is saved. Pipe()
is a method that reads data from a readable stream and writes it to a writable stream.
const fs = require('fs') const https = require('https') // URL of the image const url = 'GFG.jpeg' https.get(url, (res) => { // Image will be stored at this path const path = `${__dirname}/files/img.jpeg` const filePath = fs.createWriteStream(path) res.pipe(filePath) filePath.on('finish', () => { filePath.close() console.log('Download Completed') }) })
Method 2: DownloadHelper
npm install node-downloader-helper
The following is the code to download images from the website. An object dl is created by the class DownloadHelper, which receives two parameters:
The File variable contains the URL of the image that will be downloaded, and the filePath variable contains the path to the file that will be saved.
const { DownloaderHelper } = require('node-downloader-helper') // URL of the image const file = 'GFG.jpeg' // Path at which image will be downloaded const filePath = `${__dirname}/files` const dl = new DownloaderHelper(file, filePath) dl.on('end', () => console.log('Download Completed')) dl.start()
Method 3: Use download
is written by npm master sindresorhus, very easy to use
npm install download
The following is the code to download images from the website. The download function receives a file and file path.
const download = require('download') // Url of the image const file = 'GFG.jpeg' // Path at which image will get downloaded const filePath = `${__dirname}/files` download(file, filePath).then(() => { console.log('Download Completed') })
I originally wanted to crawl Baidu wallpapers, but the resolution was not enough, and there were watermarks, etc. Later, a friend in the group found an API, which I guess. For high-definition wallpapers on a certain mobile app, you can directly get the download URL, so I used it directly.
The following is the complete code
const download = require('download') const axios = require('axios') let headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36', } function sleep(time) { return new Promise((reslove) => setTimeout(reslove, time)) } async function load(skip = 0) { const data = await axios .get( 'http://service.picasso.adesk.com/v1/vertical/category/4e4d610cdf714d2966000000/vertical', { headers, params: { limit: 30, // 每页固定返回30条 skip: skip, first: 0, order: 'hot', }, } ) .then((res) => { return res.data.res.vertical }) .catch((err) => { console.log(err) }) await downloadFile(data) await sleep(3000) if (skip < 1000) { load(skip + 30) } else { console.log('下载完成') } } async function downloadFile(data) { for (let index = 0; index < data.length; index++) { const item = data[index] // Path at which image will get downloaded const filePath = `${__dirname}/美女` await download(item.wp, filePath, { filename: item.id + '.jpeg', headers, }).then(() => { console.log(`Download ${item.id} Completed`) return }) } } load()
In the above code, you must first set User-Agent
and set a 3s delay. This can prevent the server from blocking the crawler and directly return 403.
Directly node index.js
will automatically download the image.
、
WeChat applet search "水瓜图" experience.
https://p6-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/c5301b8b97094e92bfae240d7eb1ec5e~tplv-k3u1fbpfcp-zoom-1.awebp?
More nodes For related knowledge, please visit: nodejs tutorial!
The above is the detailed content of Practical sharing: Use nodejs to crawl and download more than 10,000 images. For more information, please follow other related articles on the PHP Chinese website!