Home > Article > Backend Development > Nginx performance optimization
Nginx is a very popular and mature Web Server and Reserve Proxy Server. There are a large number of performance optimization tutorials on the Internet. However, different business scenarios vary widely. What configuration is most suitable for you requires a lot of testing and practice as well as continuous optimization. Improve. Recently, after the number of user calls exceeded one million, we encountered some problems. Although they were not too complicated, it took a long time to solve them and accumulated a lot of experience.
This problem has actually been encountered for some time. Some customers have reported to us that the call times out, but from our own system monitoring, it is normal. It is only a few tens of milliseconds and will definitely not time out. I wonder if it is caused by the network. reasons, but after it happened a few times, I had a vague feeling that this problem might not be accidental, and there should be deep-seated reasons.
Because our service is for enterprise customers, although the call volume of each customer may be very large, each enterprise customer only has a few public network IPs. Even if there are thousands of customers in the future, Nginx can easily support these concurrent connections. . Therefore, we first optimized the Nginx long connection from the network, changed the long connection from the original configuration of 5 seconds
to 5 minutes
, and adjusted the number of connection requests each time from the default 100 to 1000.
<code>keepalive_timeout 300; keepalive_requests 1000;</code>
After the adjustment is completed, pass netstat
As you can see from the -anp
command, the number of new connection requests will decrease, indicating that long connections have worked. But after a while, it was still found that customer call timeouts occurred. From the Nginx log, we can see that the request time is still more than 1s, and even as long as about 20s, as shown below:
and A phenomenon was discovered from the monitoring on Zabbix. When the number of connection writing or actives suddenly increases, the request time will appear correspondingly with more timeouts:
Check the application log and find that the execution time is not long:
The time counted in the application is only the time from the start of the business execution to the execution result. This does not include the execution time of the Tomcat container. The execution path of the external request is as follows:
<code>client --> Nginx --> Tomcat --> App</code>
Is it possible that the Tomcat container itself is executing? As for the problem, I called up the Tomcat request log and found that the execution before and after this time point was also normal:
From the analysis of the request path, there must be some problems at the Nginx to Tomcat layer. When I was troubleshooting this problem, I suddenly found a large number of timeouts of about 30 seconds, which was also observed on Zabbixconnection
writing
is very high, as shown below:
At the same time, it is found that there are very many connections in TIME_WAIT
. Judging from the phenomenon and packet capture analysis results, it should be that some customers have not enabled long connections, and we are on the server side. I also set keepalive_timeout
to 5 minutes, which caused a large number of used connections to wait for timeout. There were nearly 2,000 at that time. Edit the /etc/sysctl.conf
file and add the following two parameters to reuse the connection:
<code>net.ipv4.tcp_tw_reuse = 1 #表示开启重用。允许将TIME-WAIT sockets重新用于新的TCP连接,默认为0,表示关闭; net.ipv4.tcp_tw_recycle = 1 #表示开启TCP连接中TIME-WAIT sockets的快速回收,默认为0,表示关闭。</code>
It will take effect soon after It has quickly dropped below 200, as can be seen from Zabbix monitoring, connection
Both writing
and connection active` have dropped significantly, but the problem has not been completely solved, and other reasons must be found.
Nginx’s reqeust_time
refers to the time from the client receiving the first byte to calling the upstream of the backend
The time until the server completes the business logic processing and writes all the returned results back to the client. If the time of calling the upstream server can be printed, it will be easier to narrow the scope of the problem. Fortunately, Nginx has two parameters that can be printed. The time and IP address requested by the backend server, modify the log format in the nginx.conf file as follows:
<code># $upstream_response_time 后端应用服务器响应时间 # $upstream_addr 后端服务器IP和端口号 log_format main '$remote_addr - [$time_local] "$request" ' '$status $body_bytes_sent ' '"$request_time" "$upstream_response_time" "$upstream_addr" "$request_body "';</code>
Looking at the log again, it is very obvious that most of the particularly long calls come from the same machine:
Looking at this machine, I found that although the Java process is still there, the application has actually crashed, and no real requests are coming in. Remove it from the load balancing, and the problem is immediately alleviated:
This machine has actually hung up, but why didn't Nginx recognize it? Further research found that when Nginx calls the upstream server, the default timeout is 60s. Our applications have very high requirements for response time. It is meaningless to exceed 1s. Therefore, modify the default timeout in the nginx.conf file and return if it exceeds 1s. :
<code># time out settings proxy_connect_timeout 1s; proxy_send_timeout 1s; proxy_read_timeout 1s;</code>运行一段时间后,问题已基本得到解决,不过还是会发现
request_time
超过1s达到5s的,但upstream_response_time
都没有超时了,说明上面的参数已起作用。
版权声明:本文为博主原创文章,未经博主允许不得转载。
以上就介绍了Nginx 性能优化,包括了方面的内容,希望对PHP教程有兴趣的朋友有所帮助。