Continuing from the previous chapter, we need to modify the program to continuously capture the content of 40 pages. That is to say, we need to output the title, link, first comment, commenting user and forum points of each article.
As shown in the figure, the value obtained by $('.reply_author').eq(0).text().trim();
is the correct user who made the first comment.
{}
After eventproxy obtains the comments and username content, we need to jump to the user interface through the username to continue grabbing the user’s points
var $ = cheerio.load(topicHtml);
//This URL is the target URL for the next step
var userHref = 'https://cnodejs.org' $('.reply_author').eq(0).attr('href');
userHref = url.resolve(tUrl, userHref);
var title = $('.topic_full_title').text().trim().replace(/n/g,"");;
var href = topicUrl;
var comment1 = $('.reply_content').eq(0).text().trim();
var author1 = $('.reply_author').eq(0).text().trim();
//Pass parameters to the next concurrent crawl
ep.emit('user_html', [userHref, title, href, comment1, author1]);
In eventproxy this time, we need to find where the score is placed (class="big").
{}
It’s easy to find the classname. Let’s try to output the result first
var outcome = superagent.get(userUrl)
.end(function (err, res) {
If (err) {
return console.error(err);
}
var $ = cheerio.load(res.text);
var score = $('.big').text().trim();
console.log(user[1]);
console.log(user[2]);
console.log(user[3]);
console.log(user[4]);
console.log($('.big').text().trim());
return ({
title: user[1],
href: user[2],
comment1: user[3],
author1: user[4],
score1: score
});
});
});
Run the program and get the result of this code.
{}
But here comes the problem. We can correctly output the result in the callback function of .end(), but we cannot correctly output the outcome. If you look carefully, the outcome that needs to be output is a Request object. This is a careless mistake. The .end() function does not pass the return value to the Request object, and the result needs to be returned to the previous layer (users).
//find userDetails
ep.after('user_html', topicUrls.length, function(users){
users = users.map(function(user){
var userUrl = user[0];
var score;
superagent.get(userUrl)
.end(function (err, res) {
if (err) {
return console.error(err);
}
//console.log(res.text);
var $ = cheerio.load(res.text);
score = $('.big').text().trim();
});
return ({
title: user[1],
href: user[2],
comment1: user[3],
author1: user[4],
score1: score
});
});
把users好好地输出发现除了score1其他是正确值。仔细调试发现,程序是先进行了console.log(),然后再进行.map()。更准确地说,在.map()函数内,.get()的回调函数并没有执行完赋值score,return 返回值就进行了。这就是回调函数的异步,而外层的同步操作是不会等待回调函数做完操作的。
{}
我的做法就是eventproxy再emit一层消息,伴随着消息把需要的数据一起传递给接收消息操作.after(),只有当消息全部接收完毕,再打印出传递的参数(结果)。
score = $('.big')text().trim();
//新添加
ep.emit('got_score', [user[1], user[2], user[3], user[4], score]);
.....
ep.after('got_score', 10, function(users){
console.log(users);
});
{}
这个问题解决了,但score1的数值好像太大了点吧。再一看,原来class='big'有两个,用户的话题收藏也是属于这个class。我们得通过cheerio的.slice( start, [end] )来切取第一个元素,即将score 修改为 score = $('.big').slice(0).eq(0).text().trim();。正确结果如图。
{}

Vercel是什么?本篇文章带大家了解一下Vercel,并介绍一下在Vercel中部署 Node 服务的方法,希望对大家有所帮助!

gm是基于node.js的图片处理插件,它封装了图片处理工具GraphicsMagick(GM)和ImageMagick(IM),可使用spawn的方式调用。gm插件不是node默认安装的,需执行“npm install gm -S”进行安装才可使用。

今天跟大家介绍一个最新开源的 javaScript 运行时:Bun.js。比 Node.js 快三倍,新 JavaScript 运行时 Bun 火了!

大家都知道 Node.js 是单线程的,却不知它也提供了多进(线)程模块来加速处理一些特殊任务,本文便带领大家了解下 Node.js 的多进(线)程,希望对大家有所帮助!

在nodejs中,lts是长期支持的意思,是“Long Time Support”的缩写;Node有奇数版本和偶数版本两条发布流程线,当一个奇数版本发布后,最近的一个偶数版本会立即进入LTS维护计划,一直持续18个月,在之后会有12个月的延长维护期,lts期间可以支持“bug fix”变更。

node怎么爬取数据?下面本篇文章给大家分享一个node爬虫实例,聊聊利用node抓取小说章节的方法,希望对大家有所帮助!


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1
Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version
