Home > Article > Backend Development > PHP uses QueryList to easily collect JavaScript dynamically rendered pages
QueryList uses jQuery for collection and has a wealth of plug-ins.
The following demonstrates how QueryList uses the PhantomJS plug-in to capture page content dynamically created by JS.
Recommended: "PHP Tutorial"
Installation
Use Composer to install:
Install QueryList
composer require jaeger/querylist GitHub: https://github.com/jae-jae/QueryList
Install PhantomJS plug-in
composer require jaeger/querylist-phantomjs GitHub: https://github.com/jae-jae/QueryList-PhantomJS
Download PhantomJS binary file
PhantomJS official website: http:/ /phantomjs.org, download the PhantomJS binary file for the corresponding platform.
Plug-in API
QueryList browser($url,$debug = false,$commandOpt = []): Use the browser to open the connection
Usage
Take the mobile version of "Today's Toutiao" as an example. The mobile version of "Today's Toutiao" is based on the React framework, and the content is purely dynamically rendered.
The following demonstrates the usage of QueryList's PhantomJs plug-in:
Install the plug-in
use QL\QueryList; use QL\Ext\PhantomJs; $ql = QueryList::getInstance(); // 安装时需要设置PhantomJS二进制文件路径 $ql->use(PhantomJs::class,'/usr/local/bin/phantomjs'); //or Custom function name $ql->use(PhantomJs::class,'/usr/local/bin/phantomjs','browser');
Example-1
Get dynamically rendered HTML:
$html = $ql->browser('https://m.toutiao.com')->getHtml(); print_r($html);
Get all p tag text content:
$data = $ql->browser('https://m.toutiao.com')->find('p')->texts(); print_r($data->all());
Output:
Array ( [0] => 自拍模式开启!国庆假期我和国旗合个影 [1] => 你旅途已开始 他们仍在自己的岗位上为你的假期保驾护航 [2] => 喜极而泣,都教授终于回到地球了! //.... )
Use http proxy:
// 更多选项可以查看文档: http://phantomjs.org/api/command-line.html $ql->browser('https://m.toutiao.com',true,[ // 使用http代理 '--proxy' => '192.168.1.42:8080', '--proxy-type' => 'http' ])
Example-2
Customize a complex request:
$data = $ql->browser(function (\JonnyW\PhantomJs\Http\RequestInterface $r){ $r->setMethod('GET'); $r->setUrl('https://m.toutiao.com'); $r->setTimeout(10000); // 10 seconds $r->setDelay(3); // 3 seconds return $r; })->find('p')->texts(); print_r($data->all());
Enable debug mode and load the cookie file locally:
$data = $ql->browser(function (\JonnyW\PhantomJs\Http\RequestInterface $r){ $r->setMethod('GET'); $r->setUrl('https://m.toutiao.com'); $r->setTimeout(10000); // 10 seconds $r->setDelay(3); // 3 seconds return $r; },true,[ '--cookies-file' => '/path/to/cookies.txt' ])->rules([ 'title' => ['p','text'], 'link' => ['a','href'] ])->query()->getData(); print_r($data->all());
The above is the detailed content of PHP uses QueryList to easily collect JavaScript dynamically rendered pages. For more information, please follow other related articles on the PHP Chinese website!