Home  >  Article  >  Web Front-end  >  Practical sharing: Use nodejs to crawl and download more than 10,000 images

Practical sharing: Use nodejs to crawl and download more than 10,000 images

青灯夜游
青灯夜游forward
2022-03-24 19:49:284734browse

This article will share with you a node practical experience to see how the author used nodejs to crawl more than 10,000 little sister wallpapers. I hope it will be helpful to everyone!

Practical sharing: Use nodejs to crawl and download more than 10,000 images

Hello, everyone, I am Xiaoma, why do I need to download so many pictures? A few days ago, I used uni-app uniCloud to deploy a wallpaper applet for free. Then I need some resources to fill the applet with content.

Crawling pictures

First initialize the project and install axios and cheerio

npm init -y && npm i axios cheerio

axios Used to crawl web page content, cheerio is the jquery api on the server side, we use it to obtain the image address in the dom;

const axios = require('axios')
const cheerio = require('cheerio')

function getImageUrl(target_url, containerEelment) {
  let result_list = []
  const res = await axios.get(target_url)
  const html = res.data
  const $ = cheerio.load(html)
  const result_list = []
  $(containerEelment).each((element) => {
    result_list.push($(element).find('img').attr('src'))
  })
  return result_list
}

allows us to obtain the image URL in the page. Next, you need to download the image according to the url.

How to use nodejs to download files

Method 1: Use the built-in modules 'https' and 'fs'

Usenodejs Downloading files can be done using built-in packages or third-party libraries.

The GET method is used with HTTPS to get the file to download. createWriteStream() is a method used to create a writable stream. It only receives one parameter, which is the location where the file is saved. Pipe() is a method that reads data from a readable stream and writes it to a writable stream.

const fs = require('fs')
const https = require('https')

// URL of the image
const url = 'GFG.jpeg'

https.get(url, (res) => {
  // Image will be stored at this path
  const path = `${__dirname}/files/img.jpeg`
  const filePath = fs.createWriteStream(path)
  res.pipe(filePath)
  filePath.on('finish', () => {
    filePath.close()
    console.log('Download Completed')
  })
})

Method 2: DownloadHelper

npm install node-downloader-helper

The following is the code to download images from the website. An object dl is created by the class DownloadHelper, which receives two parameters:

  1. The image to be downloaded.
  2. The path where the image must be saved after downloading.

The File variable contains the URL of the image that will be downloaded, and the filePath variable contains the path to the file that will be saved.

const { DownloaderHelper } = require('node-downloader-helper')

// URL of the image
const file = 'GFG.jpeg'
// Path at which image will be downloaded
const filePath = `${__dirname}/files`

const dl = new DownloaderHelper(file, filePath)

dl.on('end', () => console.log('Download Completed'))
dl.start()

Method 3: Use download

is written by npm master sindresorhus, very easy to use

npm install download

The following is the code to download images from the website. The download function receives a file and file path.

const download = require('download')

// Url of the image
const file = 'GFG.jpeg'
// Path at which image will get downloaded
const filePath = `${__dirname}/files`

download(file, filePath).then(() => {
  console.log('Download Completed')
})

Final code

I originally wanted to crawl Baidu wallpapers, but the resolution was not enough, and there were watermarks, etc. Later, a friend in the group found an API, which I guess. For high-definition wallpapers on a certain mobile app, you can directly get the download URL, so I used it directly.

The following is the complete code

const download = require('download')
const axios = require('axios')

let headers = {
  'User-Agent':
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
}

function sleep(time) {
  return new Promise((reslove) => setTimeout(reslove, time))
}

async function load(skip = 0) {
  const data = await axios
    .get(
      'http://service.picasso.adesk.com/v1/vertical/category/4e4d610cdf714d2966000000/vertical',
      {
        headers,
        params: {
          limit: 30, // 每页固定返回30条
          skip: skip,
          first: 0,
          order: 'hot',
        },
      }
    )
    .then((res) => {
      return res.data.res.vertical
    })
    .catch((err) => {
      console.log(err)
    })
  await downloadFile(data)
  await sleep(3000)
  if (skip < 1000) {
    load(skip + 30)
  } else {
    console.log(&#39;下载完成&#39;)
  }
}

async function downloadFile(data) {
  for (let index = 0; index < data.length; index++) {
    const item = data[index]

    // Path at which image will get downloaded
    const filePath = `${__dirname}/美女`

    await download(item.wp, filePath, {
      filename: item.id + &#39;.jpeg&#39;,
      headers,
    }).then(() => {
      console.log(`Download ${item.id} Completed`)
      return
    })
  }
}

load()

In the above code, you must first set User-Agent and set a 3s delay. This can prevent the server from blocking the crawler and directly return 403.

Directly node index.js will automatically download the image.

Practical sharing: Use nodejs to crawl and download more than 10,000 imagesPractical sharing: Use nodejs to crawl and download more than 10,000 images

experience

WeChat applet search "水瓜图" experience.

https://p6-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/c5301b8b97094e92bfae240d7eb1ec5e~tplv-k3u1fbpfcp-zoom-1.awebp?

More nodes For related knowledge, please visit: nodejs tutorial!

The above is the detailed content of Practical sharing: Use nodejs to crawl and download more than 10,000 images. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete