Home >Web Front-end >JS Tutorial >How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?
Scraping Dynamic Content with Node.js
When scraping websites, it's not uncommon to encounter dynamic content that may not be immediately visible when the page loads. To extract data from these pages effectively, you need to understand how such content is created.
Example with Cheerio
Consider the following code snippet:
var request = require('request'); var cheerio = require('cheerio'); var url = "http://www.bdtong.co.kr/index.php?c_category=C02"; request(url, function (err, res, html) { var $ = cheerio.load(html); $('.listMain > li').each(function () { console.log($(this).find('a').attr('href')); }); });
This code attempts to scrape a website using Cheerio, but it returns empty results because the elements you want to extract (
Solution: Using PhantomJS
To scrape dynamic content, you need a solution that can execute JavaScript and simulate a browser. This is where PhantomJS comes in. PhantomJS is a headless browser engine that allows you to execute JavaScript commands and render web pages.
Here's how you can modify your code with PhantomJS:
var phantom = require('phantom'); phantom.create(function (ph) { ph.createPage(function (page) { var url = "http://www.bdtong.co.kr/index.php?c_category=C02"; page.open(url, function() { page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() { page.evaluate(function() { $('.listMain > li').each(function () { console.log($(this).find('a').attr('href')); }); }, function(){ ph.exit() }); }); }); }); });By including PhantomJS, you can now execute JavaScript on the page and manipulate the DOM to extract the dynamic content you need.
The above is the detailed content of How Can I Scrape Dynamic Website Content Using Node.js and PhantomJS?. For more information, please follow other related articles on the PHP Chinese website!