Foreword:
I recently started a project, and I was responsible for the architectural design and implementation of the project. Originally, the company made a lot of APIs for people outside the company, but when outsiders use it, the interface link is given to others. There is no encryption or concurrency control. Wherever the machine where the interface program is located, it is given to others. The IP is there, and there is no platform to manage it. Therefore, I know clearly that the value of these interfaces is difficult to discover (which interface is used more by others, and which interface is used less by others).
Only for the need of "monitoring", we introduced redis as the middle layer. First, we improved the registration process of the user interface, and hashed a key through user information and address. This key is the corresponding With an address, store this (key - address) pair in redis. Next is nginx. The process of nginx in our project is roughly like this:
1. After the user registers, he obtains his key and accesses it through a URL that contains the key that is completely different from the original URL
2. nginx captures the user's special key, and then the program retrieves the target address from redis based on this key, and then nginx accesses the real address on behalf of the user, and then returns.
(This process has many benefits)
(1). The real address is hidden. The program can intervene in the user's access outside the upstream server, improve security, and intervene in the process. It can be very complicated
(2) Obtain user information and store it back to redis. The upstream server will persist the logs in redis into oracle through a timer program and delete them, and then further analyze and visualize them
Here comes the problem
This project is still in the testing phase. The resources are a window server server and a centos6.5 server. During the testing phase, there are about 100,000 concurrencies within 10 seconds. It has just been deployed. There was still no problem for a day or two, but then the redis connection failed. Looking at process access, the following situation will appear. (Under window server)
Many FiN_WAIT_2 TCP links appear.
(Learning video sharing: redis video tutorial)
Analysis
1. Redis uses a single thread to handle connections, which means it will definitely The following two situations occur.
2. Obviously this is caused by a lot of unreleased resources between nginx and redis. Check the TCP status FIN_WAIT_2 and explain:
In HTTP applications, there is a The problem is that the SERVER closes the connection for some reason, such as the timeout of KEEPALIVE. In this way, the SERVER that actively closes will enter the FIN_WAIT2 state, but there is a problem with the TCP/IP protocol stack. The FIN_WAIT2 state does not have a timeout (unlike the TIME_WAIT state). ), so if the CLIENT is not closed, this FIN_WAIT_2 state will remain until the system is restarted, and more and more FIN_WAIT_2 states will cause the kernel to crash.
Okay, I didn’t study well in college. Here are the status changes of the http connection
Client status migration
CLOSED->SYN_SENT->ESTABLISHED->FIN_WAIT_1 ->FIN_WAIT_2->TIME_WAIT->CLOSEDb.
Server status migration
CLOSED->LISTEN->SYN received->ESTABLISHED->CLOSE_WAIT-> LAST_ACK->CLOSED
Flawed Clients and Persistent Connections
There are some clients that have problems handling persistent connections (akakeepalives). When the connection becomes idle and the server closes the connection (based on the KeepAliveTimeout directive),
the client is programmed so that it does not send FIN and ACK back to the server. This means that the connection will stay in the FIN_WAIT_2 state until one of the following occurs:
The client opens a new connection to the same or a different site, which will cause it to completely close the socket Previous connection.
The user exits the client program, which on some (perhaps most?) clients causes the operating system to completely close the connection.
FIN_WAIT_2 timeout, on those servers that have a FIN_WAIT_2 status timeout setting.
If you're lucky, this means the defective client will close the connection completely and free up your server's resources.
However, there are some situations where the socket is never fully closed, such as a dial-up client disconnecting from the ISP before closing the client program.
In addition, some clients may be idle for several days without creating a new connection, and thus keep the socket valid for several days even if it is no longer used. This is a bug in the TCP implementation of the browser or operating system.
The reasons are:
1. Long connection and when the connection is always in the IDLE state and causes SERVERCLOSE, the CLIENT programming defect does not send FIN and ACK packets to the SERVER
2 , APACHE1.1 and APACHE1.2 added the linger_close() function, which was introduced in the previous post. This function may have caused this problem (I don’t know why)
Solution:
1. Add a timeout mechanism to the FIN_WAIT_2 state. This feature is not reflected in the protocol, but has been implemented in some OSs
such as: LINUX, SOLARIS, FREEBSD, HP-UNIX, IRIX, etc.
2. Don't compile with linger_close()
3. Use SO_LINGER instead, which can be handled well in some systems
4. Increase the memory mbuf used to store network connection status to prevent kernel crash
5. DISABLE KEEPALIVE
In view of this situation, we have had several discussions and some conclusions are:
1. Set the connection pool of nginx and redis, and set the keepalive time to 10 respectively. Seconds, 5 seconds, but the result is still the same
2. No keepalive, that is, no connection pool is used, that is, close() is closed every time it is used up. You can see that there are fewer connections, but the connection pool is not used. , which means that it needs to be opened and closed 100,000 times in 10 seconds, which is too expensive.
3. Redis cluster. Adding a redis cluster to the original cluster system may solve the problem, but 100,000 times in 10 seconds In fact, there are not many. This may be a trick, and the problem has not been found
4. Set the idle time limit of redis, and the result is the same.
Solution:
is not actually a solution, because the memory mechanism of redis is abandoned and nginx’s own memory technology is used. Most of the online optimization of redis is not applicable, and this problem needs to be analyzed and resolved.
Related recommendations: redis database tutorial
The above is the detailed content of About the performance analysis of redis under high concurrency. For more information, please follow other related articles on the PHP Chinese website!