search
HomeWeb Front-endJS TutorialNode crawl data example: grab the Pokémon illustrated book and generate an Excel file

How to use Node to crawl data from web pages and write them into Excel files? The following article uses an example to explain how to use Node.js to crawl web page data and generate Excel files. I hope it will be helpful to everyone!

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

I believe that Pokémon is the childhood memory of many people born in the 90s. As a programmer, I have wanted to make a Pokémon game more than once, but before doing so, I should first Sort out how many Pokémon there are, their numbers, names, attributes and other information. In this issue, we will use Node.js to simply implement a crawling of Pokémon web data to convert these The data is generated into an Excel file until the interface is used to read Excel to access the data.

Crawling data

Since we are crawling data, let’s first find a webpage with Pokémon illustrated data, as shown below:

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

This website is written in PHP, and there is no separation between the front and back, so we will not read the interface to capture data. We use the crawler library to capture the content of the web page. elements to get the data. Let me explain in advance, the advantage of using the crawler library is that you can use jQuery to capture elements in the Node environment.

Installation:

yarn add crawler

Implementation:

const Crawler = require("crawler");
const fs = require("fs")
const { resolve } = require("path")

let crawler = new Crawler({
    timeout: 10000,
    jQuery: true,
});

let crawler = new Crawler({
    timeout: 10000,
    jQuery: true,
});

function getPokemon() {
    let uri = "" // 宝可梦图鉴地址
    let data = []
    return new Promise((resolve, reject) => {
        crawler.queue({
            uri,
            callback: (err, res, done) => {
                if (err) reject(err);
                let $ = res.$;
                try {
                    let $tr = $(".roundy.eplist tr");
                    $tr.each((i, el) => {
                        let $td = $(el).find("td");
                        let _code = $td.eq(1).text().split("\n")[0]
                        let _name = $td.eq(3).text().split("\n")[0]
                        let _attr = $td.eq(4).text().split("\n")[0]
                        let _other = $td.eq(5).text().split("\n")[0]
                        _attr = _other.indexOf("属性") != -1 ? _attr : `${_attr}+${_other}`
                        if (_code) {
                            data.push([_code, _name, _attr])
                        }
                    })
                    done();
                    resolve(data)
                } catch (err) {
                    done()
                    reject(err)
                }

            }
        })
    })
}

When generating an instance, you also need to enable the jQuery mode, and then you can use $ matches. The business of the middle part of the above code is to capture the data required in elements and crawl web pages. It is used the same as jQuery API, so I won’t go into details here.

getPokemon().then(async data => {
    console.log(data)
})

Finally we can execute and print the passed data data to verify that the format has been crawled and there are no errors.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Write to Excel

Now that we have crawled the data just now, next, we will use node -xlsx library to complete writing data and generating an Excel file.

First of all, let’s introduce that node-xlsx is a simple excel file parser and generator. The one built by TS relies on the SheetJS xlsx module to parse/build excel worksheets, so the two can be common in some parameter configurations.

Installation:

yarn add node-xlsx

Implementation:

const xlsx = require("node-xlsx")

getPokemon().then(async data => {
    let title = ["编号", "宝可梦", "属性"]
    let list = [{
        name: "关都",
        data: [
            title,
            ...data
        ]
    }];
    const sheetOptions = { '!cols': [{ wch: 15 }, { wch: 20 }, { wch: 20 }] };
    const buffer = await xlsx.build(list, { sheetOptions })
    try {
        await fs.writeFileSync(resolve(__dirname, "data/pokemon.xlsx"), buffer, "utf8")
    } catch (error) { }
})

The name is the column name in the Excel file, and the data If the type is an array, an array must also be passed in to form a two-dimensional array, which means that the incoming text is sorted starting from the ABCDE.... column. At the same time, you can set the column width through !cols. The first object wch:10 means that the width of the first column is 10 characters. There are many parameters that can be set. You can refer to xlsx library to learn these configuration items.

Finally, we generate buffer data through the xlsx.build method, and finally use fs.writeFileSync to write or create an Excel file , for the convenience of viewing, I have stored it in a folder named data. At this time, we will find an additional file called pokemon.xlsx in the data folder. Open it and the data will be the same. Write the data like this This step of entering Excel is complete.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Reading Excel

Reading Excel is actually very easy and you don’t even need to write fs to read. Use the xlsx.parse method to pass in the file address to read it directly.

xlsx.parse(resolve(__dirname, "data/pokemon.xlsx"));

Of course, in order to verify the accuracy, we directly write an interface to see if we can access the data. For convenience, I directly use the express framework to accomplish this.

Let’s install it first:

yarn add express

Then, create the express service. I use 3000 for the port number here, so just write a GET Just request to send the data read from the Excel file.

const express = require("express")
const app = express();
const listenPort = 3000;

app.get("/pokemon",(req,res)=>{
    let data = xlsx.parse(resolve(__dirname, "data/pokemon.xlsx"));
    res.send(data)
})

app.listen(listenPort, () => {
    console.log(`Server running at http://localhost:${listenPort}/`)
})

Finally, I use the postman access interface here, and you can clearly see that all Pokémon data we have received from crawling to storing in the table.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Conclusion

As you can see, this article uses Pokémon as an example to learn how to use Node.js to crawl data from web pages and how to write data to Excel files. , and how to read data from Excel files. In fact, it is not difficult to implement, but sometimes it is quite practical. If you are worried about forgetting, you can save it~

More node related For knowledge, please visit: nodejs tutorial!

The above is the detailed content of Node crawl data example: grab the Pokémon illustrated book and generate an Excel file. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete
From Websites to Apps: The Diverse Applications of JavaScriptFrom Websites to Apps: The Diverse Applications of JavaScriptApr 22, 2025 am 12:02 AM

JavaScript is widely used in websites, mobile applications, desktop applications and server-side programming. 1) In website development, JavaScript operates DOM together with HTML and CSS to achieve dynamic effects and supports frameworks such as jQuery and React. 2) Through ReactNative and Ionic, JavaScript is used to develop cross-platform mobile applications. 3) The Electron framework enables JavaScript to build desktop applications. 4) Node.js allows JavaScript to run on the server side and supports high concurrent requests.

Python vs. JavaScript: Use Cases and Applications ComparedPython vs. JavaScript: Use Cases and Applications ComparedApr 21, 2025 am 12:01 AM

Python is more suitable for data science and automation, while JavaScript is more suitable for front-end and full-stack development. 1. Python performs well in data science and machine learning, using libraries such as NumPy and Pandas for data processing and modeling. 2. Python is concise and efficient in automation and scripting. 3. JavaScript is indispensable in front-end development and is used to build dynamic web pages and single-page applications. 4. JavaScript plays a role in back-end development through Node.js and supports full-stack development.

The Role of C/C   in JavaScript Interpreters and CompilersThe Role of C/C in JavaScript Interpreters and CompilersApr 20, 2025 am 12:01 AM

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript in Action: Real-World Examples and ProjectsJavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

JavaScript and the Web: Core Functionality and Use CasesJavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

Understanding the JavaScript Engine: Implementation DetailsUnderstanding the JavaScript Engine: Implementation DetailsApr 17, 2025 am 12:05 AM

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python vs. JavaScript: The Learning Curve and Ease of UsePython vs. JavaScript: The Learning Curve and Ease of UseApr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Python vs. JavaScript: Community, Libraries, and ResourcesPython vs. JavaScript: Community, Libraries, and ResourcesApr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools