Home >Web Front-end >JS Tutorial >How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?

How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-13 07:50:10592browse

How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?

Scraping Dynamic Content with Node.js

When scraping websites, it's not uncommon to encounter dynamic content that may not be immediately visible when the page loads. To extract data from these pages effectively, you need to understand how such content is created.

Example with Cheerio

Consider the following code snippet:

var request = require('request');
var cheerio = require('cheerio');
var url = "http://www.bdtong.co.kr/index.php?c_category=C02";

request(url, function (err, res, html) {
    var $ = cheerio.load(html);
    $('.listMain > li').each(function () {
        console.log($(this).find('a').attr('href'));
    });
});

This code attempts to scrape a website using Cheerio, but it returns empty results because the elements you want to extract (

    ) are dynamically created after the page loads.

    Solution: Using PhantomJS

    To scrape dynamic content, you need a solution that can execute JavaScript and simulate a browser. This is where PhantomJS comes in. PhantomJS is a headless browser engine that allows you to execute JavaScript commands and render web pages.

    Here's how you can modify your code with PhantomJS:

    var phantom = require('phantom');
    
    phantom.create(function (ph) {
      ph.createPage(function (page) {
        var url = "http://www.bdtong.co.kr/index.php?c_category=C02";
        page.open(url, function() {
          page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
            page.evaluate(function() {
              $('.listMain > li').each(function () {
                console.log($(this).find('a').attr('href'));
              });
            }, function(){
              ph.exit()
            });
          });
        });
      });
    });

    By including PhantomJS, you can now execute JavaScript on the page and manipulate the DOM to extract the dynamic content you need.

    The above is the detailed content of How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn