Home  >  Article  >  Java  >  Meituan interview: Let’s talk about CAP, my way of answering is very special

Meituan interview: Let’s talk about CAP, my way of answering is very special

Java后端技术全栈
Java后端技术全栈forward
2023-08-24 15:11:52875browse


Case Background

CAP The theory is a distributed system Although the interviewer will not ask you directly about the principles of CAP theory during the interview, the distributed system design issues encountered in the interview cannot bypass your understanding and thinking of CAP.

And during the interview, the interviewer’s requirements will be different for candidates interviewing for different positions, and the depth of your answers will be different. So in today's article, I will analyze interview ideas from two different perspectives: junior and intermediate R&D engineers and senior R&D engineers.

Case Analysis

I believe that as long as you have studied the relevant knowledge of distributed technology, you basically know what the CAP theory refers to:

  • C (Consistency) is data consistency,
  • ##A (
    Availability) is service availability,
  • P (
    Partition tolerance) is partition tolerance.
C, A, and P can only meet two goals at the same time, and since P must be retained in a distributed system, a choice must be made between C and A. If you want to ensure service availability, choose the AP model, and if you want to ensure consistency, choose the

CP model.

Many candidates find that interview questions (such as "For data disaster recovery, we will do master-slave backup of data, so what impact does the data consistency of the master-slave node have on the calling end?") involve Regarding "understanding and thinking about CAP", you will subconsciously give a similar answer: "CAP theory describes the choice between C and A when a network partition occurs, so it will affect the caller's perspective. The system is unavailable". If it were me, I would probably give it a passing grade, and think that such an answer can only prove that you are prepared, not that you are capable.

Because when you encounter theoretical problems in the interview, it is difficult to prove your technical ability to the interviewer by simply giving a superficial conceptual explanation. The interviewer will think that you are a R&D person who is new to distributed systems or does not have a deep understanding of distributed systems. If this happens to be your first interview question, it will directly affect the interviewer's first impression of you and even you. of rating.

Based on my experience, if you want to answer better, you need to first master the principles, practical experience, and technical knowledge of CAP, and then analyze it based on specific interview questions.

Question Answer

Understanding the Principle

There is now a distributed system A, which has a copy A1, which is normally In this case, the client writes data to system A, then the data is synchronized from node A to node A1, and then returns a success status to the client.

Meituan interview: Let’s talk about CAP, my way of answering is very special
Picture

At this time, the client Client reads data from any node A or A1, and can read the latest written data, indicating that A and A1 The data is consistent, and both A and A1 are available.

But because the network is unreliable, the network of nodes A and A1 may be partitioned at any time due to interruptions. The so-called network partition means that nodes A and A1 are isolated in different network subsets due to network failure. At this time, the data of node A cannot be synchronized to node A1 in time.

Meituan interview: Let’s talk about CAP, my way of answering is very special
Picture

In distributed systems, network partitions due to network problems are the norm. That is to say, when a network partition occurs, according to the CAP theory, a trade-off needs to be made between A and C, that is, either to ensure system availability or to ensure data consistency.

You should pay attention here. The above example has a major premise, that is, a network partition occurs in the system, but the actual situation is that most of the time, there is no network partition (the network does not often have problems) ). So do we still need to choose two out of three (CP or AP)?

In fact, different distributed systems need to weigh among the three CAPs based on business scenarios and business needs. CAP theory is used to guide factors that need to be measured when designing a system, rather than making absolute choices.

When there is no partition in the network, the CAP theory does not give factors to measure A and C, but if you have done actual distributed system design, you will definitely find the latency of system data synchronization (Latency) , that is, in the example, the time it takes for node A to synchronize data to node A1 is the most important factor in measuring A and C. At this time, there will be no absolute AP model or CP model, but rather a comprehensive consideration of the actual business scenario. .

Therefore, there will be new models such as PACELC "Reference1" to optimize the original CAP theory, where the theory guides practice and the practice optimizes the theory.

According to the definition of the PACELC model, if a network partition occurs, the system must strike a balance between A and C. Otherwise (Else, the E in PACELC) when the system runs without a network partition, The system needs to strike a balance between L (latency) and C.

Meituan interview: Let’s talk about CAP, my way of answering is very specialPACELC

But understanding it to this level is not enough, you also need to prove it with implementation experience.

Practical Experience

You have to realize that the distributed design of the Internet is a trade-off between data consistency and system availability, and it is not either/or. This is particularly important. Therefore, even if strong consistency cannot be achieved (in simple terms, strong consistency means that the data queried by all users is the latest at any time), you can still use appropriate methods according to your own business characteristics to make the system achieve final consistency. sex.

At this time, BASE theory will be introduced, which is an extension of CAP theory. BASE is the three words Basically Available (basically available), Soft State (soft state) and Eventually Consistent (eventual consistency) The abbreviation is to ensure the availability of the system, and then replace strong consistency with final consistency. It is the most instructive experience summary in the current distributed system design. So how do you use BASE theory to guide design practice in actual projects?

Basically available in BASE refers to ensuring the basic availability of core functions. In fact, it is a compromise in "availability", such as:

E-commerce websites in When access pressure is high, such as the Double Eleven promotion, turn off the display of secondary functions such as product rankings to ensure the availability of the main product transaction process. This is also what we often call service downgrade;

In order to stagger During the peak period of Double Eleven, e-commerce websites will delay the payment time for pre-sale goods by ten to twenty minutes, which is traffic peak shaving;

When you rush to buy goods, you will often wait in the queue. Processing, which is also commonly used as a delay queue.

Soft state and eventual consistency refer to allowing data in the system to exist in an intermediate state. This is also a practice of sacrificing data consistency within a period of time for system availability, thereby ensuring final data consistency. .

At present, this way of processing data has almost become the standard design pattern of the Internet. The most classic example is that when the user places an order, there is no need to actually deduct the inventory, but only count the number at the front desk. , and then batch processing in the background through asynchronous tasks.

If you want to apply for a junior and intermediate R&D engineer, then combined with the above ideas, from theoretical understanding to implementation, you can already answer CAP theory more clearly. The logic of answering the question can refer to the suggestions I gave:

  • First, fully understand the theoretical principles and not just float on the concepts;
  • Secondly, you need to have your own thinking and show the difference in your thinking ability;
  • Then combine theory with practice and discuss the thinking logic when dealing with problems in practice.

Technical Awareness

If you are applying for a senior R&D engineer or architect, when answering, you must also demonstrate your knowledge system and technical judgment as much as possible. These are the basic qualities for these two positions. Because distributed technology is complex and various technologies are coupled with each other, during the interview, if you can expand a clear distributed core technology knowledge system through a knowledge point of CAP theory, you will be different from others.

A distributed system looks like a computer. Computers include five major architectures (i.e. von Neumann architecture). It has five major components:

  • Controller
  • Operator
  • Memory
  • Input
  • Output

You can understand it this way: a distributed system also includes these five major Components , the most important of which are computing and storage. Computing and storage are composed of a series of network nodes. The communication between each node is input and output, and the scheduling management between each node is the controller.

Meituan interview: Let’s talk about CAP, my way of answering is very specialDistributed architecture technology composition

From this point of view, a distributed system is like a network computer, and its knowledge system includes four perspectives:

  • Memory, that is, distributed storage system, such as NoSQL database storage;
  • Operator, that is, distributed computing, such as distributed parallel computing;
  • Input and output, that is, distributed system communication, such as synchronous RPC calls and asynchronous message queues;
  • Controller, that is, scheduling management, such as traffic scheduling, task scheduling and resources Scheduling.

You can summarize the knowledge system of distributed systems from these four perspectives.

So what is the specific solution to the problem? Taking "Can Redis be used as a distributed lock" as an example, let's analyze the distributed theoretical knowledge hidden behind the problem and the problem-solving ideas of a senior R&D engineer.

Problem-solving ideas

  • Explanation of actual problems

Generally use the setnx method to implement locks and timeouts through Redis To control the lock expiration time. But in extreme cases, when the Reids master node hangs up but the lock has not been synchronized to the slave node, according to the sentinel mechanism, the slave becomes the master and continues to provide services. At this time, another thread can request the lock again, and at this time two threads will get the lock.

  • Guidance from regression theory

According to the understanding of CAP theory, the design model of Redis is the AP model, and the distributed lock is a CP scenario, then it is obvious that applying the AP model architecture of Redis to the CP scenario is wrong in terms of underlying technology selection.

  • Expand to the knowledge system

Redis is a distributed storage system, and you must have knowledge about the field of distributed storage systems in your mind system. Think about how its data storage, data distribution, data replication, and data consistency are implemented, what technologies are used to achieve them, and why such technology or algorithm selection is required. You need to learn to compare and analyze different methods for the same distributed problem from multiple dimensions and angles, and then comprehensively weigh the advantages and disadvantages of various methods, and ultimately form your own technical cognition and technical judgment.

  • Technical judgment

For example, through Redis, you can think of the current development status and technical implementation of distributed cache systems. If you let When you create a "Redis", what issues will you consider, etc. Although it is not recommended to repeatedly "invent the wheel" in actual work, you must show your ability to "invent the wheel" during the interview.

Summary

CAP theory seems simple, but in the interview, the depth of understanding of it can reflect your understanding of distributed The overall understanding and control ability of the system.

So you not only need to master how to answer the CAP principle questions in the case during the interview, but also master the ideas for answering the questions. When you encounter similar theoretical knowledge examinations in the future, you can answer from three levels. .

The above is the detailed content of Meituan interview: Let’s talk about CAP, my way of answering is very special. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java后端技术全栈. If there is any infringement, please contact admin@php.cn delete