Home  >  Article  >  Web Front-end  >  A brief discussion on high concurrency and distributed clustering in node.js

A brief discussion on high concurrency and distributed clustering in node.js

不言
不言Original
2018-08-01 15:54:462828browse

This article introduces to you a brief discussion of high concurrency and distributed clusters in node.js. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Node features: high concurrency

Before explaining why node can achieve high concurrency, you might as well understand several other features of node:

Single-threaded

Let’s first clarify a concept, that is: node is single-threaded, which is the same as the characteristics of JavaScript in the browser, and the JavaScript main thread in node is the same as Other threads (such as I/O threads) cannot share state.

The advantage of single thread is:

  • There is no need to pay attention to the state synchronization between threads like multi-threading

  • There is no overhead caused by thread switching

  • There is no deadlock

Of course, single thread also has many disadvantages:

  • Unable to fully utilize multi-core CPU

  • A large number of calculations occupying the CPU will cause application blocking (that is, not suitable for CPU-intensive applications)

  • Errors will cause the entire application to exit

However, it seems today that these disadvantages are no longer problems or have been appropriately resolved:

(1) Create Process or subdivision example

Regarding the first problem, the most straightforward solution is to use the child_process core module or cluster: child_process and net combined application. We can make full use of each core by creating multiple processes on a multi-core server (usually using a fork operation), but we need to deal with inter-process communication issues.

Another solution is that we can divide the physical machine into multiple single-core virtual machines, and use tools such as pm2 to manage multiple virtual machines to form a cluster architecture to efficiently run the required services. As for each I will not discuss the communication (status synchronization) between machines here, and will explain it in detail in the Node distributed architecture below.

(2) Time slice rotation

Regarding the second point, after discussing with my friends, I believe that we can use time slice rotation to simulate multi-threading on a single thread and appropriately reduce the risk of application blocking. Feeling (although this method will not really save time like multi-threading)

(3) Load balancing, dead pixel monitoring/isolation

As for the third point, my friends and I We have also discussed that the main pain point is that node is different from JAVA, and the logic it implements is mainly asynchronous.

This results in node being unable to use try/catch to catch and bypass errors as conveniently as JAVA, because it is impossible to determine when the asynchronous task will return the exception. In a single-threaded environment, failure to bypass errors means that the application will exit, and the gap between restarts and recovery will cause service interruptions, which we do not want to see.

Of course, now that the server resources are abundant, we can use tools such as pm2 or nginx to dynamically determine the service status. Isolate the bad pixel server when a service error occurs, forward the request to the normal server, and restart the bad pixel server to continue providing services. This is also part of Node's distributed architecture.

Asynchronous I/O

You may ask, since node is single-threaded and all events are processed on one thread, shouldn't it be inefficient and contrary to high concurrency?

On the contrary, the performance of node is very high. One of the reasons is that node has the asynchronous I/O feature. Whenever an I/O request occurs, node will provide an I/O thread for the request. Then node will not care about the I/O operation process, but will continue to execute the event on the main thread. It only needs to be processed when the request returns the callback. That is to say, node saves a lot of time waiting for requests.

This is also one of the important reasons why node supports high concurrency

In fact, not only I/O operations, most of the operations of node are asynchronous. carried out in a manner. It is like an organizer. It does not need to do everything personally. It only needs to tell members how to operate correctly, accept feedback, and handle key steps, so that the entire team can run efficiently.

Transaction Driver

You may want to ask again, how does node know that the request has returned a callback, and when should it handle these callbacks?

The answer is another feature of node: Transaction driver, that is, the main thread runs the program through the event loop event loop trigger

This is node support Another important reason for high concurrency

Illustration of Event loop in node environment:

   ┌───────────────────────┐
┌─>│        timers         │<p><strong>poll stage:</strong></p><p>When entering poll phase, and when no timers are called, the following situation will occur: </p><p> (1) If the poll queue is not empty: </p>
  • Event Loop will be executed synchronously Poll the callback (new I/O event) in the queue until the queue is empty or the executed callback comes online.

(2) If the poll queue is empty:

  • If the script calls setImmediate(), the Event Loop will end the poll phase and enter Execute the callback of setImmediate() in the check phase.

  • If the script is not called by setImmediate(), the Event Loop will wait for callbacks (new I/O events) to be added to the queue, and then execute them immediately.

When entering the poll phase and calling timers, the following situation will occur:

  • Once the poll queue is empty, the Event Loop will check Whether timers, if one or more timers have arrived, the Event Loop will return to the timer phase and execute the callbacks of those timers (ie, enter the next tick).

##Priority:

Next Tick Queue > MicroTask Queue

setTimeout, setInterval > setImmediate

Since timer needs to take out the timer from the red-black tree to determine whether the time has arrived, the time complexity is O(lg(n)), so if you want to execute an event asynchronously immediately, it is best not to use setTimeout(func, 0) . Instead use process.nextTick() to do it.

Distributed Node Architecture

The Node cluster architecture I learned is mainly divided into the following modules:

Nginx (load balancing, scheduling) -> Node cluster-> Redis (synchronization status)

I compiled a picture according to my understanding:

A brief discussion on high concurrency and distributed clustering in node.js## Of course, this should be an ideal architecture. Because although Redis's read/write is quite fast, this is because it stores data in the memory pool and performs related operations on the memory.

This is quite high for the memory load of the server, so usually we still add Mysql to the architecture, as shown below:

A brief discussion on high concurrency and distributed clustering in node.jsExplain this picture first:

When user data arrives, the data is first written to Mysql. When Node needs the data, it will go to Redis to read it. If it is not found, it will go to Mysql to query the desired data and write it. Enter Redis, and you can directly query in Redis next time you use it.


The advantages of joining Mysql compared to only reading/writing in Redis are:

(1) Avoid writing useless data to Redis in the short term, occupying memory, and reducing the burden on Redis

(2) When specific queries and analysis of data are needed in the later stage (such as analyzing the increase in users of operational activities), SQL relational query can provide great help

Of course, when dealing with large traffic in a short period of time When writing, we can also write data directly to Redis to quickly store data and increase the server's ability to cope with traffic. When the traffic subsides, we can write the data to Mysql separately.

After briefly introducing the general architecture, let’s take a closer look at the details of each part:

Traffic access layer

What the traffic access layer does is All accepted traffic is processed and the following services are provided:

    Traffic buffering
  • Diversion and forwarding

A brief discussion on high concurrency and distributed clustering in node.js

##Timeout detection
  • Timeout in establishing a connection with the user
    • Read user body timeout
    • Connect backend timeout
    • Read backend response header timeout
    • Write response timeout
    • Long connection timeout with user
    • Cluster health check/isolation Bad pixel server
  • Isolate the bad pixel server and try to repair/restart until the server returns to normal
    • Failure reset Trial mechanism
  • #After the request is forwarded to a certain machine in a certain cluster and a failure is returned, the request is forwarded to other machines in the cluster, or to machines across clusters. Retry
    • Connection pool/session persistence mechanism
  • Use the connection pool mechanism for delay-sensitive users to reduce connection establishment Time
    • Security Protection
  • Data Analysis
  • When forwarded to each product After going online, it’s time for the load layer to work: forward the request to various computer rooms according to the situation

A brief discussion on high concurrency and distributed clustering in node.jsOf course, this platform does not only forward this function , you can understand it as a large private cloud system that provides the following services:

File upload/service online deployment
  • line Modify the configuration
  • Set scheduled tasks
  • Online system monitoring/log printing service
  • Online instance management
  • Mirror center
  • etc...
  • Node cluster layer
The main work of this layer is:

(1) Write reliable Node code and provide back-end services for needs

(2) Write high-performance query statements, interact with Redis and Mysql, and improve query efficiency

(3) Synchronize the status of each Node service in the cluster through Redis

(4) Pass Hardware management platform, manages/monitors the status of physical machines, manages IP addresses, etc. (In fact, it feels inappropriate to put this part of the work on this layer, but I don’t know which layer it should be placed on...)

(Of course, I only briefly list the entries in this part, it still takes time to accumulate and understand deeply)

Database layer

The main work of this layer is:

(1) Create Mysql and design related pages and tables; establish necessary indexes and foreign keys to improve query convenience

(2) Deploy redis and provide corresponding interfaces to the Node layer

Related recommendations :

How vue uses axios to request back-end data

Analysis of form input binding and component foundation in Vue

The above is the detailed content of A brief discussion on high concurrency and distributed clustering in node.js. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn