Home  >  Article  >  PHP Framework  >  Record the experience of using Laravel-s to resist Baidu crawlers

Record the experience of using Laravel-s to resist Baidu crawlers

藏色散人
藏色散人forward
2020-08-22 13:21:443221browse

The following tutorial column will record the experience of using Laravel-s to resist Baidu crawlers. I hope it will be helpful to friends in need!

Record the experience of using Laravel-s to resist Baidu crawlers

What is Laravel-s

LaravelS is a glue project for fast Integrate Swoole into Laravel or Lumen to give them better performance

github address


Why use Laravel-s

After the Baidu applet was launched, the high qps (concurrency) of the Baidu crawler caused the CPU to be fully loaded and the server to crash. The server was configured with 4 cores, 8G memory and 5M broadband. What to do at this time?

Adjust the php-fpm parameters and set it to static. Static mode has higher performance than dynamic mode. For example, if you set the number of child processes to 255 or even higher, the higher the number, the greater the amount of concurrency it will bear, but the higher the number, the more memory it will occupy. Conclusion, it is effective to a certain extent, but it is useless under high concurrency.

  • Feedback to Baidu to adjust the crawler crawling frequency. Conclusion, wait a minute, the day lilies are already cold, but it’s better to give feedback.

  • Load balancing. Let other servers share the pressure. The premise is that there are enough servers and the same code must be deployed, and the business that other servers are originally responsible for cannot be affected. Or temporarily apply for N servers in a certain cloud, but you don’t know when the crawler will come and when it will go, which is unrealistic.

  • The next step is the topic of the article, using Laravel-s to accelerate http response.

How much acceleration effect does Laravel-s have?

Because there was no statistics for all periods at that timeqps Specific values, so there is no way to draw accurate conclusions. We can only compare based on the machine load before and after adjustment.

Before deployment, cpu was fully loaded, and the machine was down N times and was paralyzed. The external network broadband is full (5M). After deployment, the cpu immediately drops to

20

. After temporarily upgrading the broadband to 15M, cpu reaches 60%. The external network broadband It is still fully occupied (it can only be said that Baidu crawler is a real one, you can get as much bandwidth as you want). In conclusion, it brings at least 5 times performance improvement.

Specific deployment

The page crawled by the crawler is only part of the page, so the online project is not transformed into laravel-s is also unrealistic. We only need to separate the crawled pages and deploy them to

laravel-s

separately. Create a new empty project, the business logic only processes the captured pages

api
    , the project port number is such as 6501
  • deployment laravel-s, test api and ab stress test

  • The online project will proxy the page path crawled by the crawler to the new project, such as

    127.0.0.1:6501
  • location ~ ^/v1/test.* {
     proxy_pass http://127.0.0.1:6501;
     proxy_set_header Host $host;}

    A few points to note:

In

conf/laravels.php
    , the default The number of
  • worker

    enabled is twice the number of cpu cores.

    laravles
  • is running in
  • swoole

    , in memory. Every time you change the code, you need to restart laravel-s . Due to the reason in Article 2, the database connection cannot be released, and

    laravel
  • needs to be enabled to disconnect and reconnect (>laravle5.1). Add
  • 'options'   => [
     // 开启持久连接
     \PDO::ATTR_PERSISTENT => true,],
    in conf/database.php mysql configuration

The above is the detailed content of Record the experience of using Laravel-s to resist Baidu crawlers. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:learnku.com. If there is any infringement, please contact admin@php.cn delete