First of all, I have no objection to others crawling the content of my website. I don’t necessarily strictly limit other people’s crawling, but some people’s crawling has no bottom line at all. They use one script or even multiple scripts to crawl concurrently. Fetching the content of a server is no different from ddos.
My server is currently experiencing such a situation. Malicious crawling without pause has seriously affected our log analysis and also increased the load on the server.
How to prevent this behavior? I am using nginx server. As far as I know, it can only deny
a certain IP, but deny will still appear in the log in the future, but it will be 403. Moreover, manual deny is too passive. Can you intelligently determine that the number of visits to a certain IP has increased sharply, and then ban it?
某草草2017-05-16 17:32:17
1. ngx_http_limit_conn_module can be used to limit the number of connections for a single IP
http://nginx.org/en/docs/http/ngx_htt...
2. ngx_http_limit_req_module can be used to limit the number of requests per second for a single IP
http://nginx.org/en/docs/http/ngx_htt...
3. nginx_limit_speed_module can be used to limit IP speed
https://github.com/yaoweibin/nginx_li...
世界只因有你2017-05-16 17:32:17
I will also provide a solution, mainly using fail2ban (http://www.fail2ban.org/). fail2ban asynchronously determines whether to ban using iptable by scanning the log, so it has a relatively small impact on the original system and does not require reconfiguration of nginx. But I don’t know if the number of visits will be too large.
First add /etc/fail2ban/jail.conf
in
[http-get-dos] enabled = true port = http,https filter = nginx-bansniffer logpath = /usr/local/nginx/logs/segmentfault.log maxretry = 120 findtime = 120 bantime = 3600 action = iptables[name=HTTP, port=http, protocol=tcp]
Then find /etc/fail2ban/filter.d/nginx-bansniffer.conf
and change the judgment for 404 to
[Definition] failregex = <HOST> -.*- .*HTTP/1.* .* .*$ ignoreregex =
Finally restart the fail2ban
service. In the above configuration, we ban IP addresses with more than 120 visits every 120 seconds for 1 hour.
高洛峰2017-05-16 17:32:17
1. Prevent spider crawling based on User-Agent
## Block download agents ## if ($http_user_agent ~* WebZIP|wget) { return 403; } ##
2. Create rules in the operating system Firewall to limit the number of simultaneous connections to the same IP
Taking iptables under Linux as an example, the following configuration will limit the establishment of a maximum of 15 connections for the same IP in one minute. Exceeding connections will be discarded by iptables and will not reach nginx
/sbin/iptables -A INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --set /sbin/iptables -A INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 15 -j DROP service iptables save
3. Write a bash script to count the access frequency of each IP, and automatically throw the IPs whose frequency exceeds the upper limit you set into the blacklist
For the IP in the blacklist, use a script to automatically write it into iptables or nginx.conf, block it for a few minutes, or reduce its permitted access frequency
I used to use an apache module called YDoD (Yahoo! Department of Defense) when I was at Yahoo. I could customize rules to prevent external abuse of our WEB services. After I came to Taobao, I changed my name to tdod. After searching around, I couldn’t find it. Find open source. But the principle is similar to what I said above.
PHPz2017-05-16 17:32:17
Try ngx_lua_waf
https://github.com/loveshell/ngx_lua_waf
Function:
防止sql注入,本地包含,部分溢出,fuzzing测试,xss,SSRF等web攻击
防止svn/备份之类文件泄漏
防止ApacheBench之类压力测试工具的攻击
屏蔽常见的扫描黑客工具,扫描器
屏蔽异常的网络请求
屏蔽图片附件类目录php执行权限
防止webshell上传