search
HomePHP FrameworkLaravelUsing puppeteer in Laravel to collect asynchronously loaded web page content

Using puppeteer in Laravel to collect asynchronously loaded web page content

Collecting web content is a very common need. For more traditional static pages, curl can handle it. But if there is dynamically loaded content in the page, such as the text content of articles loaded through ajax in some pages, and if some pages undergo some additional processing after loading (image address replacement, etc...) and you want to collect these processed Content. Then the awesome curl is helpless.

People who have had similar needs may say, old man, use PhantomJS!

Yes, this is a way, and for a long time PhantomJS has been one of the few tools that can solve such needs.

But what I want to introduce today is a tool that came from behind - puppeteer, which developed rapidly with the rise of Chrome Headless technology. And very importantly, puppeteer is developed and maintained by Chrome’s official team, which can be said to be quite reliable!

puppeteer is a js package. If you want to use it in Laravel, you have to use another artifact, spatie/browsershot.

Installation

Install spatie/browsershot

browsershot is a composer package from the great team spatie

$ composer require spatie/browsershot

Install puppeteer

$ npm i puppeteer --save

You can also secure puppeteer globally, but as far as personal experience is concerned, it is more recommended to install it in the project, because in this way different projects will not be affected by the globally installed puppeteer at the same time. In addition, it is also convenient to install phpdeployer in the project. Upgrade (upgrading phpdeploy will not affect the operation of online projects. You must know that upgrading/installing puppeteer is very time-consuming, and sometimes success is not guaranteed).

When installing puppeteer, Chromium-Browser will be downloaded. In view of our special national conditions, it is very likely that it cannot be downloaded. In this regard, please show your skills...

Use

to collect the content of articles on the mobile version of Toutiao today as an example.

use Spatie\Browsershot\Browsershot;
public function getBodyHtml()
{
    $newsUrl = 'https://m.toutiao.com/i6546884151050502660/';
    
    $html = Browsershot::url($newsUrl)
        ->windowSize(480, 800)
        ->userAgent('Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36')
        ->mobile()
        ->touch()
        ->bodyHtml();
    \Log::info($html);
}

After running, you can see the following content in the log (the screenshot is only part of it)

Using puppeteer in Laravel to collect asynchronously loaded web page content

In addition, you can also save the page as an image or PDF document.

use Spatie\Browsershot\Browsershot;
public function getBodyHtml()
{
    $newsUrl = 'https://m.toutiao.com/i6546884151050502660/';
    
    Browsershot::url($newsUrl)
        ->windowSize(480, 800)
        ->userAgent('Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36')
        ->mobile()
        ->touch()
        ->setDelay(1000)
        ->save(public_path('images/toutiao.jpg'));
}

Using puppeteer in Laravel to collect asynchronously loaded web page content

The boxes in the picture are related to the system fonts. A setDelay() method is used in the code to take a screenshot after the content is loaded. It is simple and crude and may not be the best solution.

Possible problems

The system must support the Chromium browser. Of course, most browsers now support it. Otherwise, there is nothing you can do. Let’s use PhantomJS. .

After puppeteer is installed in the project, there may be permission problems when calling. This requires giving appropriate permissions to the /node_modules/puppeteer directory under the project.

Summary

puppeteer is used in testing, collection and other scenarios, and is a very powerful tool. It is enough for light collection tasks, such as this article, which is used to collect some small pages in Laravel (php), but if you need to quickly collect a large amount of content, Python or something like that

The above is the detailed content of Using puppeteer in Laravel to collect asynchronously loaded web page content. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:segmentfault. If there is any infringement, please contact admin@php.cn delete
laravel单点登录方法详解laravel单点登录方法详解Jun 15, 2022 am 11:45 AM

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于单点登录的相关问题,单点登录是指在多个应用系统中,用户只需要登录一次就可以访问所有相互信任的应用系统,下面一起来看一下,希望对大家有帮助。

一起来聊聊Laravel的生命周期一起来聊聊Laravel的生命周期Apr 25, 2022 pm 12:04 PM

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于Laravel的生命周期相关问题,Laravel 的生命周期从public\index.php开始,从public\index.php结束,希望对大家有帮助。

laravel中guard是什么laravel中guard是什么Jun 02, 2022 pm 05:54 PM

在laravel中,guard是一个用于用户认证的插件;guard的作用就是处理认证判断每一个请求,从数据库中读取数据和用户输入的对比,调用是否登录过或者允许通过的,并且Guard能非常灵活的构建一套自己的认证体系。

laravel中asset()方法怎么用laravel中asset()方法怎么用Jun 02, 2022 pm 04:55 PM

laravel中asset()方法的用法:1、用于引入静态文件,语法为“src="{{asset(‘需要引入的文件路径’)}}"”;2、用于给当前请求的scheme前端资源生成一个url,语法为“$url = asset('前端资源')”。

实例详解laravel使用中间件记录用户请求日志实例详解laravel使用中间件记录用户请求日志Apr 26, 2022 am 11:53 AM

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于使用中间件记录用户请求日志的相关问题,包括了创建中间件、注册中间件、记录用户访问等等内容,下面一起来看一下,希望对大家有帮助。

laravel中间件基础详解laravel中间件基础详解May 18, 2022 am 11:46 AM

本篇文章给大家带来了关于laravel的相关知识,其中主要介绍了关于中间件的相关问题,包括了什么是中间件、自定义中间件等等,中间件为过滤进入应用的 HTTP 请求提供了一套便利的机制,下面一起来看一下,希望对大家有帮助。

laravel的fill方法怎么用laravel的fill方法怎么用Jun 06, 2022 pm 03:33 PM

在laravel中,fill方法是一个给Eloquent实例赋值属性的方法,该方法可以理解为用于过滤前端传输过来的与模型中对应的多余字段;当调用该方法时,会先去检测当前Model的状态,根据fillable数组的设置,Model会处于不同的状态。

laravel路由文件在哪个目录里laravel路由文件在哪个目录里Apr 28, 2022 pm 01:07 PM

laravel路由文件在“routes”目录里。Laravel中所有的路由文件定义在routes目录下,它里面的内容会自动被框架加载;该目录下默认有四个路由文件用于给不同的入口使用:web.php、api.php、console.php等。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!