search
HomeWeb Front-endJS TutorialHow to operate node to achieve crawler effect

This time I will show you how to operate the node to achieve the crawler effect. What are the precautions for operating the node to achieve the crawler effect? ​​The following is a practical case, let's take a look.

Node is a server-side language, so you can crawl the website like Python. Let’s use node to crawl the blog park and get all the chapter information.

Step one: Create the crawl file and then npm init.

Second step: Create the crawl.js file. A simple code to crawl the entire page is as follows:

var http = require("http");
var url = "http://www.cnblogs.com";
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function () {
    console.log(html);
  });
}).on("error", function () {
  console.log("获取课程结果错误!");
});
Introduce the http module, and then use http The get request of the object, that is, once it is run, it is equivalent to the node server sending a get request to request this page, and then returning it through res, where the on binding data event is used to continuously receive data, and at the end we print it out in the background .

This is just a part of the entire page. We can inspect the elements on this page and find that they are indeed the same.

We only need to crawl the chapter title and the information of each section. .

The third step: Introduce the cheerio module, as follows: (Just install it in gitbash, cmd always has problems)

cnpm install cheerio --save-dev
The introduction of this module is for It is convenient for us to operate dom, just like jQuery.

Step 4: Operate dom and obtain useful information.

var http = require("http");
var cheerio = require("cheerio");
var url = "http://www.cnblogs.com";
function filterData(html) {
  var $ = cheerio.load(html); 
  var items = $(".post_item");
  var result = [];
  items.each(function (item) {
    var tit = $(this).find(".titlelnk").text();
    var aut = $(this).find(".lightblue").text();
    var one = {
      title: tit,
      author: aut
    };
    result.push(one);
  });
  return result;
}
function printInfos(allInfos) {
  allInfos.forEach(function (item) {
    console.log("文章题目 " + item["title"] + '\n' + "文章作者 " + item["author"] + '\n'+ '\n');
  });
}
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function (data) {
    var allInfos = filterData(html);
    printInfos(allInfos);
  });
}).on("error", function () {
  console.log("爬取博客园首页失败")
});
That is, the above process is crawling the title and author of the blog.

The final background output is as follows:

This is consistent with the content of the blog homepage:

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Recommended reading:

How to use Koa in Node.js to implement JWT user authentication

react-navigation use case analysis

The above is the detailed content of How to operate node to achieve crawler effect. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
node、nvm与npm有什么区别node、nvm与npm有什么区别Jul 04, 2022 pm 04:24 PM

node、nvm与npm的区别:1、nodejs是项目开发时所需要的代码库,nvm是nodejs版本管理工具,npm是nodejs包管理工具;2、nodejs能够使得javascript能够脱离浏览器运行,nvm能够管理nodejs和npm的版本,npm能够管理nodejs的第三方插件。

Vercel是什么?怎么部署Node服务?Vercel是什么?怎么部署Node服务?May 07, 2022 pm 09:34 PM

Vercel是什么?本篇文章带大家了解一下Vercel,并介绍一下在Vercel中部署 Node 服务的方法,希望对大家有所帮助!

node爬取数据实例:聊聊怎么抓取小说章节node爬取数据实例:聊聊怎么抓取小说章节May 02, 2022 am 10:00 AM

node怎么爬取数据?下面本篇文章给大家分享一个node爬虫实例,聊聊利用node抓取小说章节的方法,希望对大家有所帮助!

node导出模块有哪两种方式node导出模块有哪两种方式Apr 22, 2022 pm 02:57 PM

node导出模块的两种方式:1、利用exports,该方法可以通过添加属性的方式导出,并且可以导出多个成员;2、利用“module.exports”,该方法可以直接通过为“module.exports”赋值的方式导出模块,只能导出单个成员。

安装node时会自动安装npm吗安装node时会自动安装npm吗Apr 27, 2022 pm 03:51 PM

安装node时会自动安装npm;npm是nodejs平台默认的包管理工具,新版本的nodejs已经集成了npm,所以npm会随同nodejs一起安装,安装完成后可以利用“npm -v”命令查看是否安装成功。

聊聊V8的内存管理与垃圾回收算法聊聊V8的内存管理与垃圾回收算法Apr 27, 2022 pm 08:44 PM

本篇文章带大家了解一下V8引擎的内存管理与垃圾回收算法,希望对大家有所帮助!

node中是否包含dom和bomnode中是否包含dom和bomJul 06, 2022 am 10:19 AM

node中没有包含dom和bom;bom是指浏览器对象模型,bom是指文档对象模型,而node中采用ecmascript进行编码,并且没有浏览器也没有文档,是JavaScript运行在后端的环境平台,因此node中没有包含dom和bom。

聊聊Node.js path模块中的常用工具函数聊聊Node.js path模块中的常用工具函数Jun 08, 2022 pm 05:37 PM

本篇文章带大家聊聊Node.js中的path模块,介绍一下path的常见使用场景、执行机制,以及常用工具函数,希望对大家有所帮助!

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools