


Collecting web content is a very common need. For more traditional static pages, curl can handle it. But if there is dynamically loaded content in the page, such as the text content of articles loaded through ajax in some pages, and if some pages undergo some additional processing after loading (image address replacement, etc...) and you want to collect these processed Content. Then the awesome curl is helpless.
People who have had similar needs may say, old man, use PhantomJS!
Yes, this is a way, and for a long time PhantomJS has been one of the few tools that can solve such needs.
But what I want to introduce today is a tool that came from behind - puppeteer, which developed rapidly with the rise of Chrome Headless technology. And very importantly, puppeteer is developed and maintained by Chrome’s official team, which can be said to be quite reliable!
puppeteer is a js package. If you want to use it in Laravel, you have to use another artifact, spatie/browsershot.
Installation
Install spatie/browsershot
browsershot is a composer package from the great team spatie
$ composer require spatie/browsershot
Install puppeteer
$ npm i puppeteer --save
You can also secure puppeteer globally, but as far as personal experience is concerned, it is more recommended to install it in the project, because in this way different projects will not be affected by the globally installed puppeteer at the same time. In addition, it is also convenient to install phpdeployer in the project. Upgrade (upgrading phpdeploy will not affect the operation of online projects. You must know that upgrading/installing puppeteer is very time-consuming, and sometimes success is not guaranteed).
When installing puppeteer, Chromium-Browser will be downloaded. In view of our special national conditions, it is very likely that it cannot be downloaded. In this regard, please show your skills...
Use
to collect the content of articles on the mobile version of Toutiao today as an example.
use Spatie\Browsershot\Browsershot; public function getBodyHtml() { $newsUrl = 'https://m.toutiao.com/i6546884151050502660/'; $html = Browsershot::url($newsUrl) ->windowSize(480, 800) ->userAgent('Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36') ->mobile() ->touch() ->bodyHtml(); \Log::info($html); }
After running, you can see the following content in the log (the screenshot is only part of it)
In addition, you can also save the page as an image or PDF document.
use Spatie\Browsershot\Browsershot; public function getBodyHtml() { $newsUrl = 'https://m.toutiao.com/i6546884151050502660/'; Browsershot::url($newsUrl) ->windowSize(480, 800) ->userAgent('Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36') ->mobile() ->touch() ->setDelay(1000) ->save(public_path('images/toutiao.jpg')); }
The boxes in the picture are related to the system fonts. A setDelay() method is used in the code to take a screenshot after the content is loaded. It is simple and crude and may not be the best solution.
Possible problems
The system must support the Chromium browser. Of course, most browsers now support it. Otherwise, there is nothing you can do. Let’s use PhantomJS. .
After puppeteer is installed in the project, there may be permission problems when calling. This requires giving appropriate permissions to the /node_modules/puppeteer directory under the project.
Summary
puppeteer is used in testing, collection and other scenarios, and is a very powerful tool. It is enough for light collection tasks, such as this article, which is used to collect some small pages in Laravel (php), but if you need to quickly collect a large amount of content, Python or something like that
The above is the detailed content of Using puppeteer in Laravel to collect asynchronously loaded web page content. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于单点登录的相关问题,单点登录是指在多个应用系统中,用户只需要登录一次就可以访问所有相互信任的应用系统,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于Laravel的生命周期相关问题,Laravel 的生命周期从public\index.php开始,从public\index.php结束,希望对大家有帮助。

在laravel中,guard是一个用于用户认证的插件;guard的作用就是处理认证判断每一个请求,从数据库中读取数据和用户输入的对比,调用是否登录过或者允许通过的,并且Guard能非常灵活的构建一套自己的认证体系。

laravel中asset()方法的用法:1、用于引入静态文件,语法为“src="{{asset(‘需要引入的文件路径’)}}"”;2、用于给当前请求的scheme前端资源生成一个url,语法为“$url = asset('前端资源')”。

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于使用中间件记录用户请求日志的相关问题,包括了创建中间件、注册中间件、记录用户访问等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于中间件的相关问题,包括了什么是中间件、自定义中间件等等,中间件为过滤进入应用的 HTTP 请求提供了一套便利的机制,下面一起来看一下,希望对大家有帮助。

在laravel中,fill方法是一个给Eloquent实例赋值属性的方法,该方法可以理解为用于过滤前端传输过来的与模型中对应的多余字段;当调用该方法时,会先去检测当前Model的状态,根据fillable数组的设置,Model会处于不同的状态。

laravel路由文件在“routes”目录里。Laravel中所有的路由文件定义在routes目录下,它里面的内容会自动被框架加载;该目录下默认有四个路由文件用于给不同的入口使用:web.php、api.php、console.php等。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!
