Home >Database >Redis >How to analyze Redis knowledge points

How to analyze Redis knowledge points

WBOY
WBOYforward
2023-06-03 20:02:111036browse

It is a data structure rather than a type

Many articles will say that redis supports 5 commonly used data types. This is actually a big ambiguity. All binary data stored in redis is actually a byte array (byte[]). These byte data have no data type. They can only be turned into a string, integer or object after decoding them in a reasonable format. , only then does it have a data type.

This must be remembered. So anything that can be converted into a byte array (byte[]) can be stored in redis. As long as it is converted into a byte array, whether it is a string, number, object, picture, sound, video or file, it can be processed.

So String in redis does not refer to a string. It actually represents the simplest data structure, that is, one key can only correspond to one value. The key and value here are both byte arrays, but the key is generally a byte array converted from a string, and the value is determined according to actual needs.

In certain circumstances, there will also be some requirements for value. For example, if an auto-increment or self-decrement operation is to be performed, the byte array corresponding to the value must be able to be decoded into a number, otherwise an error will be reported.

The data structure of List actually means that one key can correspond to multiple values, and the values ​​are in order, and the value values ​​can be repeated.

Set is a data structure that indicates that one key can correspond to multiple values, and there is no order between values, and value values ​​cannot be repeated.

Hash is a data structure that indicates that one key can correspond to multiple key-value pairs. At this time, the order between these key-value pairs generally has little meaning. This is accessed according to name semantics. Data structures, not positional semantics.

Sorted Set is a data structure that indicates that a key can correspond to multiple values. The values ​​are sorted by size, and the values ​​cannot be repeated. Each value is associated with a floating point number called score. The sorting rules for elements are: sort by score first, then sort by value.

I believe that now you have a clearer understanding of these five data structures, then their corresponding commands are a small case for you.

Problems and solutions brought by clusters

The advantages brought by clusters are obvious, including increased capacity, improved processing capabilities, and dynamic expansion and contraction as needed. Allow. But it will also introduce some new problems, at least the following two.

Data allocation includes determining the node where the data is stored during storage, and determining the node where the data is queried during retrieval. The second is data movement: when the cluster expands and a new node is added, where does the data on the node come from; when the cluster shrinks and a node is removed, where does the data on the node go.

The above two questions have one thing in common, which is how to describe and store the mapping relationship between data and nodes. The evolution of the problem lies in the need to establish an association between each key and all nodes in the cluster, because the data location is determined by the key.

The nodes in the cluster are relatively fixed and few, although there are nodes added and nodes removed. In a cluster, the keys stored are huge in number, completely random, irregular, unpredictable and mostly trivial.

This is like the relationship between a university and all its students. If universities and students were directly linked, it would definitely be confusing. The reality is that there are several layers added between them. First there are departments, then there are majors, then there are grades, and finally there are classes. After these four levels of mapping, the relationship becomes much clearer.

There is no problem that cannot be solved by adding a layer. This is a very important conclusion. If so, add another layer. The same is true in computers.

Redis adds another layer between data and nodes, which is called a slot. Because the slot is mainly related to hashing, it is also called a hash slot.

***It becomes, slots are placed on the nodes, and data is placed in the slots. Slots solve the problem of granularity, which is equivalent to making the granularity larger, which facilitates data movement. Hash technology is used to solve mapping problems. It uses the hash value of the key to calculate the slot where it is located to facilitate data distribution.

There are piles of books on your study table, which are extremely messy. It is very difficult to find one of them. You buy some large storage bins, sort the books into different bins according to their title length, and place them on the table.

So it becomes, there is a storage box on the table, and there are books in the storage box. This makes it easy to move books, just pick up a box and go. You can easily find the book you need by simply measuring the length of the title and heading to the corresponding box.

Actually, we didn’t do anything. We just bought a few boxes and packed the books into the boxes according to certain rules. Just such a simple move completely changed the situation that was originally a mess. Isn't it a little magical?

A cluster can only have 16384 slots, numbered 0-16383. These slots are allocated to all master nodes in the cluster, and there is no requirement for allocation policy. You can specify which numbered slots are assigned to which master node. The cluster will record the corresponding relationship between nodes and slots.

Next, you need to hash the key, divide the result by 16384 and take the remainder. The remainder will determine which slot the key falls into. slot = CRC16(key) % 16384.

Move data in units of slots. Because the number of slots is fixed, it is easier to process, so the data movement problem is solved.

Use the hash function to calculate the hash value of the key, so that its corresponding slot can be calculated, and then use the mapping relationship between the slot and the node stored in the cluster to query the node where the slot is located, so the data and nodes are Mapping, so the data allocation problem is solved.

What I want to say is that ordinary people will only learn various technologies. Experts care more about how to jump out of technology, seek a solution or direction of thinking, and go in this direction. can find the answers you want.

Cluster's choice of command operations

As long as the client establishes a link with a node in the cluster, it can obtain all node information of the entire cluster. In addition, the corresponding relationship information of all hash slots and nodes will be obtained. This information data will be cached on the client because this information is quite useful.

The client can send a request to any node, so which node should it send a request to after getting a key? In fact, it is to move the theory of the mapping relationship between the key and the node in the cluster to the client. That's it.

So the client needs to implement a hash function the same as the cluster side. First calculate the hash value of the key, and then take the remainder of 16384. In this way, the hash slot corresponding to the key is found. Use The node corresponding to the key can be found based on the corresponding relationship information between the slot and the node cached by the client.

Just send the request. You can also cache the mapping relationship between key and node. The next time you request the key, you will get the corresponding node directly without having to calculate it again.

Although the client's cache has not been updated, the cluster has changed, which shows the gap between theory and reality. It is very likely that the key requested from the corresponding node is no longer on that node. What should this node do at this time?

This node can go to the node where the key actually resides to get the data and then return it to the client. It can also directly tell the client that the key is no longer with me and attach the key at the same time. Let the client request the current node information again, similar to HTTP's 302 redirect.

This is actually a question of choice and a philosophical question. The result is that the redis cluster chose the latter. Therefore, the node only processes keys it owns. For keys it does not own, it will return a redirection error, that is, -MOVED key 127.0.0.1:6381, and the client will resend the request to this new node.

So choice is a philosophy and a piece of wisdom. More on this later. Let’s look at another situation first, which has some similarities with this problem.

Redis has a command that can bring multiple keys at one time, such as MGET. I call these multi-key commands. The request for this multi-key command is sent to a node. There is a potential problem here. I wonder if you have thought of it, that is, must the multiple keys in this command be located on the same node?

It is divided into two situations. If multiple keys are not on the same node, the node can only return a redirection error. However, multiple keys may be located on multiple different nodes. At this time, the redirection returned The errors will be very confusing, so the redis cluster chooses not to support this situation.

If multiple keys are located on the same node, there is no problem in theory. Whether the redis cluster supports it depends on the redis version. Just test it yourself when using it.

During this process, we discovered a very meaningful thing, that is, it is very necessary to map a group of related keys to the same node. This can improve efficiency and pass multiple key commands at once. Get multiple values.

Then the question is, how to name these keys so that they fall on the same node? Is it possible that we have to calculate a hash value first and then take the remainder? It would be too troublesome. Of course this is not the case, redis has already figured it out for us.

Simple reasoning, if you want two keys to be on the same node, their hash values ​​must be the same. For the hash values ​​to be the same, the strings passed into the hash function must be the same. If we only pass two identical strings, then the two strings will be treated as the same key, and the subsequent data will overwrite the previous data.

The problem here is that we use the entire key to calculate the hash value, which leads to the coupling of the key and the string involved in calculating the hash value. They need to be decoupled, that is, the key and the participating strings. Computing hashes of strings is related but different.

Redis provides us with a solution based on this principle, called key hash tag. Let’s look at the example first, {user1000}.following, {user1000}.followers. I believe you have already seen the trick, which is to only use the string between { and } in the Key to participate in calculating the hash value.

This ensures that the hash values ​​are the same and fall on the same node. But the keys are different and will not cover each other. By using hash tags to associate a set of related keys, the problem is solved happily and easily.

Solving problems relies on ingenious creativity and ideas, rather than the use of superb technology and algorithms. This is Xiaoqiang, small but powerful.

Finally let’s talk about the philosophy of choice. The main feature of Redis is to implement key-value storage and access of commonly used data structures in the shortest possible time, as well as to perform related operations on these data structures. We choose to weaken or not process anything that has nothing to do with the core or that will drag down the core. This is done to ensure that the core is simple, fast and stable.

In fact, in the face of breadth and depth, redis chose depth. Therefore, the node does not process keys it does not own, and the cluster does not support commands for multiple keys. In this way, on the one hand, the client can be responded to quickly, and on the other hand, a large amount of data transmission and merging can be avoided within the cluster.

Single-threaded model

There is only one thread in each node of the redis cluster that is responsible for accepting and executing all requests sent by the client. Technically, multiplexed I/O is used, using the Linux epoll function, so that one thread can manage many socket connections.

In addition, there are the following reasons for choosing single thread:

1. Redis operates on memory and is extremely fast (10W QPS)

2 , the overall time is mainly consumed in network transmission

3. If multiple threads are used, multi-thread synchronization is required, which will become complicated to implement

4. Threads The locking time even exceeds the time of memory operations

5. Frequent switching of multi-thread contexts requires more CPU time

6. In addition, single threads naturally support atoms Operation, and single-threaded code is simpler to write

Transaction

Everyone knows that transaction is to bundle multiple operations together, or execute them all (successful ), or none of them are executed (rolled back). Redis also supports transactions, but it may not be what you want. Let’s take a look.

Redis transactions can be divided into two steps, defining the transaction and executing the transaction. After starting a transaction, add all commands to be executed in order. This defines a transaction. You can execute the transaction using the exec command at this point, or abandon it with the discard command..

You may hope that the keys you care about do not want to be operated by others before your transaction starts. Then you can use the watch command to monitor these keys. If these keys are operated by other commands before starting execution, the transaction will be canceled. of. You can also use the unwatch command to cancel monitoring of these keys.

Redis transactions have the following characteristics:

1. If an error occurs before starting the transaction, all commands will not be executed.

2. Once started, all commands are guaranteed to be executed once. Execute in sequence without being interrupted

3. If an error is encountered during execution, execution will continue without stopping.

4. For errors encountered during execution, There will be no rollback

Reading the above description makes me question whether this can be called a transaction. Obviously, this is completely different from what we normally understand as transactions, since it's not even guaranteed to be atomic. Redis does not support atomicity because it does not support rollback, and there is a reason why this feature is not supported.

Reasons for not supporting rollback:

1. Redis believes that failures are caused by improper command usage

2. Redis does this to keep the internal implementation simple Quick

3. Redis also believes that rollback cannot solve all problems

Haha, this is the overlord clause, so it seems that not many people use redis transactions

Pipeline

The interaction process between the client and the cluster is serial blocking, that is, after the client sends a command, it must wait until the response comes back before it can send a second command. One trip is a round trip time. If you have a lot of commands and you do them one by one, it will become very slow.

Redis provides a pipeline technology that allows the client to send multiple commands at one time without waiting for a response from the server. After all commands have been sent, all responses to these commands will be received in sequence. This greatly saves a lot of time and improves efficiency.

If you are smart, have you realized another problem? Multiple commands are multiple keys. Isn’t this the multi-key operation mentioned above? Then the question is, how do you ensure these multiple keys? They are all on the same node, haha, the redis cluster has given up support for pipelines again.

However, it can be simulated on the client side, that is, using multiple connections to send commands to multiple nodes at the same time, then waiting for all nodes to return responses, then sorting them out in the order in which the commands are sent, and returning them to User code. Oops, it’s so troublesome.

Protocol

Briefly understand the redis protocol and know the redis data transmission format.

Protocol for sending request:

*Number of parameters CRLF$Number of bytes of parameter 1 CRLF Data of parameter 1 CRLF...$Number of bytes of parameter N CRLF Data of parameter N CRLF

For example, SET name lixinjie, the actual data sent is:

##*3\r\n$3\r\nSET\r\n$4\r\nname\r\n$8 \r\nlixinjie\r\n

Protocol for accepting responses:

Single line reply, *** bytes are

error message, *** bytes It is -

integer number, the first byte is:

batch reply, the *** byte is $

multiple batch replies, ** * bytes are *

For example,

OK\r\n

-ERR Operation against\r\n

:1000\r\ n

$6\r\nfoobar\r\n

*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n

It can be seen that the redis protocol is designed to be very simple.

The above is the detailed content of How to analyze Redis knowledge points. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete