Detailed explanation of the web request module of Node.js crawler
This article mainly introduces the web request module of Node.js crawler. The editor thinks it is quite good. Now I will share it with you and give it as a reference. Let’s follow the editor to take a look, I hope it can help everyone.
This article introduces the web request module of Node.js crawler and shares it with everyone. The details are as follows:
Note: If you download the latest nodegrass version, some methods have been updated. The examples in this article are no longer suitable. Please check the examples in the open source address for details.
1. Why should I write such a module?
The author wants to use Node.js to write a crawler. Although the method of requesting remote resources provided by the official Node.js API is very simple, please refer to
http:// nodejs.org/api/http.html Two methods are provided for Http requests: http.get(options, callback) and http.request(options, callback).
You will know by looking at the method, get The method is used for get requests, while the request method provides more parameters, such as other request methods, the port of the requesting host, etc. Requests for Https are similar to Http. The simplest example:
var https = require('https'); https.get('https://encrypted.google.com/', function(res) { console.log("statusCode: ", res.statusCode); console.log("headers: ", res.headers); res.on('data', function(d) { process.stdout.write(d); }); }).on('error', function(e) { console.error(e); });
For the above code, we just want to request the remote host and get the response information, such as response status, response header, and response body content. The second parameter of the get method is a callback function. We obtain the response information asynchronously. Then, in the callback function, the res object listens to data. The second parameter in the on method is another callback, and you get d (the response information you requested), it is very likely that callbacks will be introduced again when operating it, layer by layer, and finally faint. . . Regarding asynchronous programming, some students who are used to writing code in a synchronous way are very confused. Of course, some excellent synchronization libraries have been provided at home and abroad, such as Lao Zhao's Wind.js... It seems It's a bit far-fetched. In fact, what we ultimately want to get when calling get is the response information, and we don't care about the monitoring process such as res.on because it is too lazy. I don’t want to res.on('data',func) every time, so the nodegrass I want to introduce today was born.
2. Nodegrass requests resources, like Jquery’s $.get(url,func)
The simplest example:
var nodegrass = require('nodegrass'); nodegrass.get("http://www.baidu.com",function(data,status,headers){ console.log(status); console.log(headers); console.log(data); },'gbk').on('error', function(e) { console.log("Got error: " + e.message); });
At first glance, there is no difference from the official original get, it is indeed almost the same=. =! It just lacks a layer of event monitoring callbacks of res.on('data',func). Believe it or not, I seem to feel much more comfortable anyway. The second parameter is also a callback function, in which the parameter data is the response body content, status is the response status, and headers are the response headers. After getting the response content, we can extract any information we are interested in from the obtained resources. Of course, in this example, it is just a simple printed console. The third parameter is the character encoding. Currently, Node.js does not support gbk. Nodegrass internally refers to iconv-lite for processing. Therefore, if the webpage encoding you request is gbk, such as Baidu. Just add this parameter.
So what about https requests? If it is an official api, you have to introduce the https module, but the request get method is similar to http, so nodegrass integrates them by the way. Look at the example:
var nodegrass = require('nodegrass'); nodegrass.get("https://github.com",function(data,status,headers){ console.log(status); console.log(headers); console.log(data); },'utf8').on('error', function(e) { console.log("Got error: " + e.message); });
nodegrass will automatically identify whether it is http or https based on the url. Of course, your url must have it. You cannot just write www.baidu.com/ but need http. ://www.baidu.com/.
For post requests, nodegrass provides the post method. See the example:
var ng=require('nodegrass'); ng.post("https://api.weibo.com/oauth2/access_token",function(data,status,headers){ var accessToken = JSON.parse(data); var err = null; if(accessToken.error){ err = accessToken; } callback(err,accessToken); },headers,options,'utf8');
The above is part of Sina Weibo Auth2.0 request accessToken, among which Use nodegrass's post to request access_token's api.
Compared with the get method, the post method provides more headers request header parameters and options--post data, which are all types of object literals:
var headers = { 'Content-Type': 'application/x-www-form-urlencoded', 'Content-Length':data.length }; var options = { client_id : 'id', client_secret : 'cs', grant_type : 'authorization_code', redirect_uri : 'your callback url', code: acode };
3. Use nodegrass as a proxy server? ……**
Look at the example:
var ng = require('nodegrass'), http=require('http'), url=require('url'); http.createServer(function(req,res){ var pathname = url.parse(req.url).pathname; if(pathname === '/'){ ng.get('http://www.cnblogs.com/',function(data){ res.writeHeader(200,{'Content-Type':'text/html;charset=utf-8'}); res.write(data+"\n"); res.end(); },'utf8'); } }).listen(8088); console.log('server listening 8088...');
It’s that simple. Of course, the proxy server is much more complicated. This is not Yes, but at least when you access the local port 8088, do you see the blog page?
The open source address of nodegrass: https://github.com/scottkiss/nodegrass
Related recommendations:
Node.js development information crawler process Code Sharing
NodeJS Encyclopedia Crawler Instance Tutorial
Related Problems Solving Crawler Problems
The above is the detailed content of Detailed explanation of the web request module of Node.js crawler. For more information, please follow other related articles on the PHP Chinese website!

Python and JavaScript each have their own advantages, and the choice depends on project needs and personal preferences. 1. Python is easy to learn, with concise syntax, suitable for data science and back-end development, but has a slow execution speed. 2. JavaScript is everywhere in front-end development and has strong asynchronous programming capabilities. Node.js makes it suitable for full-stack development, but the syntax may be complex and error-prone.

JavaScriptisnotbuiltonCorC ;it'saninterpretedlanguagethatrunsonenginesoftenwritteninC .1)JavaScriptwasdesignedasalightweight,interpretedlanguageforwebbrowsers.2)EnginesevolvedfromsimpleinterpreterstoJITcompilers,typicallyinC ,improvingperformance.

JavaScript can be used for front-end and back-end development. The front-end enhances the user experience through DOM operations, and the back-end handles server tasks through Node.js. 1. Front-end example: Change the content of the web page text. 2. Backend example: Create a Node.js server.

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

The power of the JavaScript framework lies in simplifying development, improving user experience and application performance. When choosing a framework, consider: 1. Project size and complexity, 2. Team experience, 3. Ecosystem and community support.

Introduction I know you may find it strange, what exactly does JavaScript, C and browser have to do? They seem to be unrelated, but in fact, they play a very important role in modern web development. Today we will discuss the close connection between these three. Through this article, you will learn how JavaScript runs in the browser, the role of C in the browser engine, and how they work together to drive rendering and interaction of web pages. We all know the relationship between JavaScript and browser. JavaScript is the core language of front-end development. It runs directly in the browser, making web pages vivid and interesting. Have you ever wondered why JavaScr

Node.js excels at efficient I/O, largely thanks to streams. Streams process data incrementally, avoiding memory overload—ideal for large files, network tasks, and real-time applications. Combining streams with TypeScript's type safety creates a powe

The differences in performance and efficiency between Python and JavaScript are mainly reflected in: 1) As an interpreted language, Python runs slowly but has high development efficiency and is suitable for rapid prototype development; 2) JavaScript is limited to single thread in the browser, but multi-threading and asynchronous I/O can be used to improve performance in Node.js, and both have advantages in actual projects.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.
