CAP
The theory is a distributed system Although the interviewer will not ask you directly about the principles of CAP theory during the interview, the distributed system design issues encountered in the interview cannot bypass your understanding and thinking of CAP.
And during the interview, the interviewer’s requirements will be different for candidates interviewing for different positions, and the depth of your answers will be different. So in today's article, I will analyze interview ideas from two different perspectives: junior and intermediate R&D engineers and senior R&D engineers.
I believe that as long as you have studied the relevant knowledge of distributed technology, you basically know what the CAP theory refers to:
Consistency
) is data consistency, ) is service availability,
) is partition tolerance.
CP model.
At this time, the client Client reads data from any node A or A1, and can read the latest written data, indicating that A and A1 The data is consistent, and both A and A1 are available.
But because the network is unreliable, the network of nodes A and A1 may be partitioned at any time due to interruptions. The so-called network partition means that nodes A and A1 are isolated in different network subsets due to network failure. At this time, the data of node A cannot be synchronized to node A1 in time.
In distributed systems, network partitions due to network problems are the norm. That is to say, when a network partition occurs, according to the CAP theory, a trade-off needs to be made between A and C, that is, either to ensure system availability or to ensure data consistency.
You should pay attention here. The above example has a major premise, that is, a network partition occurs in the system, but the actual situation is that most of the time, there is no network partition (the network does not often have problems) ). So do we still need to choose two out of three (CP or AP)?
In fact, different distributed systems need to weigh among the three CAPs based on business scenarios and business needs. CAP theory is used to guide factors that need to be measured when designing a system, rather than making absolute choices.
When there is no partition in the network, the CAP theory does not give factors to measure A and C, but if you have done actual distributed system design, you will definitely find the latency of system data synchronization (Latency) , that is, in the example, the time it takes for node A to synchronize data to node A1 is the most important factor in measuring A and C. At this time, there will be no absolute AP model or CP model, but rather a comprehensive consideration of the actual business scenario. .
Therefore, there will be new models such as PACELC "Reference1" to optimize the original CAP theory, where the theory guides practice and the practice optimizes the theory.
According to the definition of the PACELC model, if a network partition occurs, the system must strike a balance between A and C. Otherwise (Else, the E in PACELC) when the system runs without a network partition, The system needs to strike a balance between L (latency) and C.
PACELC
But understanding it to this level is not enough, you also need to prove it with implementation experience.
You have to realize that the distributed design of the Internet is a trade-off between data consistency and system availability, and it is not either/or. This is particularly important. Therefore, even if strong consistency cannot be achieved (in simple terms, strong consistency means that the data queried by all users is the latest at any time), you can still use appropriate methods according to your own business characteristics to make the system achieve final consistency. sex.
At this time, BASE
theory will be introduced, which is an extension of CAP theory. BASE
is the three words Basically Available
(basically available), Soft State
(soft state) and Eventually Consistent
(eventual consistency) The abbreviation is to ensure the availability of the system, and then replace strong consistency with final consistency. It is the most instructive experience summary in the current distributed system design. So how do you use BASE theory to guide design practice in actual projects?
Basically available in BASE refers to ensuring the basic availability of core functions. In fact, it is a compromise in "availability
", such as:
E-commerce websites in When access pressure is high, such as the Double Eleven promotion, turn off the display of secondary functions such as product rankings to ensure the availability of the main product transaction process. This is also what we often call service downgrade;
In order to stagger During the peak period of Double Eleven, e-commerce websites will delay the payment time for pre-sale goods by ten to twenty minutes, which is traffic peak shaving;
When you rush to buy goods, you will often wait in the queue. Processing, which is also commonly used as a delay queue.
Soft state and eventual consistency refer to allowing data in the system to exist in an intermediate state. This is also a practice of sacrificing data consistency within a period of time for system availability, thereby ensuring final data consistency. .
At present, this way of processing data has almost become the standard design pattern of the Internet. The most classic example is that when the user places an order, there is no need to actually deduct the inventory, but only count the number at the front desk. , and then batch processing in the background through asynchronous tasks.
If you want to apply for a junior and intermediate R&D engineer, then combined with the above ideas, from theoretical understanding to implementation, you can already answer CAP theory
more clearly. The logic of answering the question can refer to the suggestions I gave:
If you are applying for a senior R&D engineer or architect, when answering, you must also demonstrate your knowledge system and technical judgment as much as possible. These are the basic qualities for these two positions. Because distributed technology is complex and various technologies are coupled with each other, during the interview, if you can expand a clear distributed core technology knowledge system through a knowledge point of CAP theory, you will be different from others.
A distributed system looks like a computer. Computers include five major architectures (i.e. von Neumann architecture
). It has five major components:
You can understand it this way: a distributed system also includes these five major Components
, the most important of which are computing and storage. Computing and storage are composed of a series of network nodes. The communication between each node is input and output, and the scheduling management between each node is the controller.
Distributed architecture technology composition
From this point of view, a distributed system is like a network computer, and its knowledge system includes four perspectives:
You can summarize the knowledge system of distributed systems from these four perspectives.
So what is the specific solution to the problem? Taking "Can Redis be used as a distributed lock" as an example, let's analyze the distributed theoretical knowledge hidden behind the problem and the problem-solving ideas of a senior R&D engineer.
Generally use the setnx method to implement locks and timeouts through Redis To control the lock expiration time. But in extreme cases, when the Reids master node hangs up but the lock has not been synchronized to the slave node, according to the sentinel mechanism, the slave becomes the master and continues to provide services. At this time, another thread can request the lock again, and at this time two threads will get the lock.
According to the understanding of CAP theory, the design model of Redis is the AP model, and the distributed lock is a CP scenario, then it is obvious that applying the AP model architecture of Redis to the CP scenario is wrong in terms of underlying technology selection.
Redis is a distributed storage system, and you must have knowledge about the field of distributed storage systems in your mind system. Think about how its data storage, data distribution, data replication, and data consistency are implemented, what technologies are used to achieve them, and why such technology or algorithm selection is required. You need to learn to compare and analyze different methods for the same distributed problem from multiple dimensions and angles, and then comprehensively weigh the advantages and disadvantages of various methods, and ultimately form your own technical cognition and technical judgment.
For example, through Redis, you can think of the current development status and technical implementation of distributed cache systems. If you let When you create a "Redis", what issues will you consider, etc. Although it is not recommended to repeatedly "invent the wheel" in actual work, you must show your ability to "invent the wheel" during the interview.
CAP theory seems simple, but in the interview, the depth of understanding of it can reflect your understanding of distributed The overall understanding and control ability of the system.
So you not only need to master how to answer the CAP principle questions in the case during the interview, but also master the ideas for answering the questions. When you encounter similar theoretical knowledge examinations in the future, you can answer from three levels. .
The above is the detailed content of Meituan interview: Let’s talk about CAP, my way of answering is very special. For more information, please follow other related articles on the PHP Chinese website!