search
HomeOperation and MaintenanceNginxMemory soars! Remember once nginx intercepted the crawler

This article brings you relevant knowledge about nginx, which mainly introduces nginx interception of crawlers. Interested friends can take a look at it together. I hope it will be helpful to everyone.

Foreword:

Recently, I found that the memory of the server skyrocketed crazily during a certain period of time. At first, I thought it was caused by normal business. After upgrading the server memory, I found that it still did not work. Solve the problem; (I am lazy here and did not find the problem at first. The default is that the business volume has increased)

Immediately check the nginx log and found some unusual requests:

Memory soars! Remember once nginx intercepted the crawler

What is this? I immediately searched it out of curiosity, and the result was:

Memory soars! Remember once nginx intercepted the crawler

Good guy, I almost didn’t send my server home. ;

Quickly solve it:

nginx level solution

I found that although it is a crawler, it is not disguised. Every request They all contain user-agent, and they are all the same, so it’s easy to solve. Just enter the code: (I apply docker here)

1, docker-compose

version: '3'
services:
  d_nginx:
    container_name: c_nginx
    env_file:
      - ./env_files/nginx-web.env
    image: nginx:1.20.1-alpine
    ports:
      - '80:80'
      - '81:81'
      - '443:443'
    links:
      - d_php
    volumes:
      - ./nginx/conf:/etc/nginx/conf.d
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/deny-agent.conf:/etc/nginx/agent-deny.conf
      - ./nginx/certs:/etc/nginx/certs
      - ./nginx/logs:/var/log/nginx/
      - ./www:/var/www/html

2. Directory structure

nginx
-----nginx.conf
-----agent-deny.conf
-----conf
----------xxxx01_server.conf
----------xxxx02_server.conf

3. agent-deny.conf

if ($http_user_agent ~* (Scrapy|AhrefsBot)) {
    return 404;
}
if ($http_user_agent ~ "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)|^$" ) {
    return 403;
}

4. Then include this agent-deny.conf

server {
    include /etc/nginx/agent-deny.conf;
    listen 80;
    server_name localhost;
    client_max_body_size 100M;
    root /var/www/html/xxxxx/public;
    index index.php;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header REMOTE-HOST $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    #客户端允许上传文件大小
    client_max_body_size 300M;

    #客户端缓冲区大小,设置过小,nginx就不会在内存里边处理,将生成临时文件,增加IO
    #默认情况下,该指令,32位系统设置一个8k缓冲区,64位系统设置一个16k缓冲区
    #client_body_buffer_size 5M;
    #发现设置改参数后,服务器内存跳动的幅度比较大,因为你不能控制客户端上传,决定不设置改参数

    #此指令禁用NGINX缓冲区并将请求体存储在临时文件中。 文件包含纯文本数据。 该指令在NGINX配置的http,server和location区块使用
    #可选值有:
    #off:该值将禁用文件写入
    #clean:请求body将被写入文件。 该文件将在处理请求后删除
    #on: 请求正文将被写入文件。 处理请求后,将不会删除该文件
    client_body_in_file_only clean;


    #客户端请求超时时间
    client_body_timeout 600s;

    location /locales {
       break;
    }

    location / {
        #禁止get请求下载.htaccess文件
        if ($request_uri = '/.htaccess') {
            return 404;
        }
        #禁止get请求下载.gitignore文件
        if ($request_uri = '/storage/.gitignore') {
            return 404;
        }
        #禁止get下载web.config文件
        if ($request_uri = '/web.config') {
            return 404;
        }
        try_files $uri $uri/ /index.php?$query_string;
    }

    location /oauth/token {
        #禁止get请求访问 /oauth/token
        if ($request_method = 'GET') {
            return 404;
        }
        try_files $uri $uri/ /index.php?$query_string;
    }

    location /other/de {
        proxy_pass http://127.0.0.1/oauth/;
        rewrite ^/other/de(.*)$ https://www.baidu.com permanent;
    }

    location ~ \.php$ {
        try_files $uri /index.php =404;
        fastcgi_split_path_info ^(.+\.php)(/.+)$;
        fastcgi_pass d_php:9000;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME  $document_root$fastcgi_script_name;
        fastcgi_connect_timeout 300s;
        fastcgi_send_timeout 300s;
        fastcgi_read_timeout 300s;
        include fastcgi_params;
        #add_header 'Access-Control-Allow-Origin' '*';
        #add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS, PUT, DELETE';
        #add_header 'Access-Control-Allow-Headers' 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,token';
    }
}

in each service so that each This AhrefsBot will be intercepted in the request.

Alibaba Cloud Security Group Interception

Analyzing the logs also found that in fact, the requested IPs only have a few segments, so for the sake of multiple guarantees (Alibaba Cloud is the fastest effective , the best effect, the paid one is different)

ip segment:

54.36.0.0
51.222.0.0
195.154.0.0

Direct external network access direction:

Memory soars! Remember once nginx intercepted the crawler

Recommended Tutorial: nginx tutorial

The above is the detailed content of Memory soars! Remember once nginx intercepted the crawler. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:learnku. If there is any infringement, please contact admin@php.cn delete
The Ultimate Showdown: NGINX vs. ApacheThe Ultimate Showdown: NGINX vs. ApacheApr 18, 2025 am 12:02 AM

NGINX is suitable for handling high concurrent requests, while Apache is suitable for scenarios where complex configurations and functional extensions are required. 1.NGINX adopts an event-driven, non-blocking architecture, and is suitable for high concurrency environments. 2. Apache adopts process or thread model to provide a rich module ecosystem that is suitable for complex configuration needs.

NGINX in Action: Examples and Real-World ApplicationsNGINX in Action: Examples and Real-World ApplicationsApr 17, 2025 am 12:18 AM

NGINX can be used to improve website performance, security, and scalability. 1) As a reverse proxy and load balancer, NGINX can optimize back-end services and share traffic. 2) Through event-driven and asynchronous architecture, NGINX efficiently handles high concurrent connections. 3) Configuration files allow flexible definition of rules, such as static file service and load balancing. 4) Optimization suggestions include enabling Gzip compression, using cache and tuning the worker process.

NGINX Unit: Supporting Different Programming LanguagesNGINX Unit: Supporting Different Programming LanguagesApr 16, 2025 am 12:15 AM

NGINXUnit supports multiple programming languages ​​and is implemented through modular design. 1. Loading language module: Load the corresponding module according to the configuration file. 2. Application startup: Execute application code when the calling language runs. 3. Request processing: forward the request to the application instance. 4. Response return: Return the processed response to the client.

Choosing Between NGINX and Apache: The Right Fit for Your NeedsChoosing Between NGINX and Apache: The Right Fit for Your NeedsApr 15, 2025 am 12:04 AM

NGINX and Apache have their own advantages and disadvantages and are suitable for different scenarios. 1.NGINX is suitable for high concurrency and low resource consumption scenarios. 2. Apache is suitable for scenarios where complex configurations and rich modules are required. By comparing their core features, performance differences, and best practices, you can help you choose the server software that best suits your needs.

How to start nginxHow to start nginxApr 14, 2025 pm 01:06 PM

Question: How to start Nginx? Answer: Install Nginx Startup Nginx Verification Nginx Is Nginx Started Explore other startup options Automatically start Nginx

How to check whether nginx is startedHow to check whether nginx is startedApr 14, 2025 pm 01:03 PM

How to confirm whether Nginx is started: 1. Use the command line: systemctl status nginx (Linux/Unix), netstat -ano | findstr 80 (Windows); 2. Check whether port 80 is open; 3. Check the Nginx startup message in the system log; 4. Use third-party tools, such as Nagios, Zabbix, and Icinga.

How to close nginxHow to close nginxApr 14, 2025 pm 01:00 PM

To shut down the Nginx service, follow these steps: Determine the installation type: Red Hat/CentOS (systemctl status nginx) or Debian/Ubuntu (service nginx status) Stop the service: Red Hat/CentOS (systemctl stop nginx) or Debian/Ubuntu (service nginx stop) Disable automatic startup (optional): Red Hat/CentOS (systemctl disabled nginx) or Debian/Ubuntu (syst

How to configure nginx in WindowsHow to configure nginx in WindowsApr 14, 2025 pm 12:57 PM

How to configure Nginx in Windows? Install Nginx and create a virtual host configuration. Modify the main configuration file and include the virtual host configuration. Start or reload Nginx. Test the configuration and view the website. Selectively enable SSL and configure SSL certificates. Selectively set the firewall to allow port 80 and 443 traffic.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.