What are the various data types and cluster-related knowledge in redis?
Various data types
string The type is simple and convenient, and supports space pre-allocation, that is, more space will be allocated each time, so string If it becomes longer next time, there is no need to apply for additional space. Of course, the premise is that the remaining space is enough. [Related recommendation: Redis video tutorial]
List type can implement a simple message queue, but note that there may be message loss, and it does not support ACK mode.
Hash The table is a bit like a relational database, but when the hash table becomes larger and larger, please be careful to avoid using statements such as hgetall, because requesting a large amount of data will cause redis Block, so the brothers behind will have to wait.
set The collection type can help you do some statistics. For example, if you want to count the active users on a certain day, you can directly throw the user ID into the collection. The collection supports some fancy operations, such as sdiff. You can get the difference between sets, and sunion can get the union between sets. It has many functions, but you must be cautious because awesome functions come at a price. These operations require some CPU and IO resources, which may cause Blocking, so operations between large sets should be used with caution.
zset can be said to be the brightest star. It can be sorted. Because it can be sorted, there are many application scenarios, such as Like the first xx users, delay queue, etc.
bitmap The advantage of bitmap is to save space, especially when doing some statistics, such as counting how many users have checked in on a certain day and whether a certain user has checked in. If you don't use bitmap, you may think of using set.
SADD day 1234//签到就添加到集合 SISMEMBER day 1234//判断1234是否签到 SCARD day //有多少个签到的
set can be functionally satisfied, but compared to bitmap, set consumes more storage space. The bottom layer of set is mainly composed of integer collection or hashtable. Integer collection can only be used when the amount of data is very small. It can only be used, usually less than 512 elements, and the elements must all be integers. For sets, the data of integer collections are more compact, and they are continuous in memory. The query can only be binary search, and the time complexity is It is O(logN), but hashtable is different. The hashtable here is the same as the hash in the five major data types of redis, but there is no value. The value points to null, and there is no conflict because it is a set. , but issues related to rehash need to be considered. Ok, it’s a bit far. We are talking about the user sign-in problem. When there are many users, set will definitely use hashtable. In the case of hashtable, in fact, each element is a dictEntry structure
typedef struct dictEntry { // 键 void *key; // 值 union { void *val; uint64_t u64; int64_t s64; } v; // 指向下个哈希表节点,形成链表 struct dictEntry *next; } dictEntry;
From What can we see in this structure? First of all, although the values union (no value) and next (no conflict) are empty, the structure itself requires space and a key. This occupied space is real, and if you use a bitmap, one bit is enough. It represents a number and saves space. Let’s take a look at how to set up and count bitmaps.
SETBIT day 1234 1//签到 GETBIT day 1234//判断1234是否签到 BITCOUNT day//有多少个签到的
bf This is the Bloom filter RedisBloom supported after redis4.0, but the corresponding module needs to be loaded separately. Of course, we can also implement our own Bloom based on the above bitmap Filter, but since redis already supports it, RedisBloom can reduce our development time. What does the Bloom filter do? I won't go into details here. Let's take a look at the related usage of RedisBloom.
# 可以通过docker的方式快速拉取镜像来玩耍 docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest docker exec -it redis-redisbloom bash redis-cli # 相关操作 bf.reserve sign 0.001 10000 bf.add sign 99 //99这个用户加入 bf.add exists 99//判断99这个用户是否存在
Because Bloom filters have misjudgments, all bf supports custom misjudgment rates. 0.001 represents the misjudgment rate, and 10000 represents the number of elements that the Bloom filter can store. When actually storing When the number of elements exceeds this value, the false positive rate will increase.
HyperLogLog can be used for statistics. Its advantage is that it takes up very little storage space. It only requires 12KB of memory to count 2^64 elements. So what does it mainly count? In fact, it is mainly about cardinality statistics, such as UV. Functionally speaking, UV can be stored using set or hash, but the disadvantage is that it consumes storage and can easily become a large key. If you want to save space, bitmap can also be used, 12KB The spatial bitmap can only count 12*1024*8=98304 elements, while HyperLogLog can count 2^64 elements. However, such a powerful technology actually has errors. HyperLogLog counts based on probability, and the standard error calculation The rate is 0.81%. In scenarios where massive data is counted and accuracy requirements are not so high, HyperLogLog is still very good at saving space.
PFADD uv 1 2 3 //1 2 3是活跃用户 PFCOUNT uv //统计
GEO 是可以应用在地理位置的业务上,比如微信附近的人或者附近的车辆等等,先来看一下如果没有GEO 这种数据结构,你如何知道你附近的人?首先得上报自己的地理位置信息吧,比如经度 116.397128,纬度 39.916527,此时可以用 string、hash 数据类型存储,但是如果要查找你附近的人,string 和 hash 这种就无能为例了,你不可能每次都要遍历全部的数据来判断,这样太耗时了,当然你也不可能通过 zset 这种数据结构来把经纬度信息当成权重,但是如果我们能把经纬度信息通过某种方式转换成一个数字,然后当成权重好像也可以,这时我们只需通过zrangebyscore key v1 v2
也可以找到附近的人。真的需要这么麻烦吗?于是 GEO 出现了,GEO 转换经纬度为数字的方法是“二分区间,区间编码”,这是什么意思呢?以经度为例,它的范围是[-180,180],如果要采用3位编码值,那么就是需要二分3次,二分后落在左边的用0表示,右边的用1表示,以经度是121.48941 来说,第一次是在[0,180]这个区间,因此记1,第二次是在[90,180],因此再记1,第三次是在[90,135],因此记0。纬度也是同样的逻辑,假设此时对应的纬度编码后是010,最后把经纬度合并在一起,需要注意的是经度的每个值在偶数位,纬度的每个值在奇数位。
1 1 0 //经度 0 1 0 //纬度 ------------ 101100 //经纬度对应的数值
原理是这样,我们再来看看 redis 如何使用 GEO:
GEOADD location 112.123456 41.112345 99 //上报用户99的地理位置信息 GEORADIUS location 112.123456 41.112345 1 km ASC COUNT 10 //获取附近1KM的人
搞懂集群
生产环境用单实例 redis 的应该比较少,单实例的风险在于:
单点故障即服务故障,没有backup
单实例压力大,又要提供读,又要提供写
于是我们首先想到的就是经典的主从模式,而且往往是一主多从,这是因为大部分应用都是读多写少的情况,我们的主负责更新,从负责提供读,就算我们的主宕机了,我们也可以选择一个从来充当主,这样整个应用依然可以提供服务。
复制过程的细节
当一个 redis 实例首次成为某个主的从的时候,这时主得把数据发给它,也就是 rdb 文件,这个过程 master 是要 fork 一个子进程来处理的,这个子进程会执行 bgsave 把当前的数据重新保存一下,然后准备发给新来的从,bgsave 的本质是读取当前内存中的数据然后保存到 rdb 文件中,这个过程涉及大量的 IO,如果直接在主进程中来处理的话,大概率会阻塞正常的请求,因此使用个子进程是个明智的选择。
那 fork 的子进程在 bgsave 过程中如果有新的变更请求会怎么办?
严格来说子进程出来的一瞬间,要保存的数据应该就是当时那个点的快照数据,所以是直接把当时的内存再复制一份吗?不复制的话,如果这期间又有变更改怎么办?其实这要说到写实复制(COW)机制,首先从表象上来看内存是一整块空间,其实这不太好维护,因此操作系统会把内存分成一小块一小块的,也就是内存分页管理,一页的大小一般是4K、8K或者16K等等,redis 的数据都是分布在这些页面上的,出于效率问题,fork 出来的子进程是和主进程是共享同一块的内存的,并不会复制内存,如果这期间主进程有数据变更,那么为了区分,这时最快捷的做法就是把对应的数据页重新复制一下,然后主的变更就在这个新的数据页上修改,并不会修改来的数据页,这样就保证了子进程处理的还是当时的快照。
以上说的变更是从快照的角度来考虑的,如果从数据的一致性来说,当快照的 rdb 被从库应用之后,这期间的变更该如何同步给从库?答案是缓冲区,这个缓冲区叫做 replication buffer
,主库在收到需要同步的命令之后,会把期间的变更都先保存在这个缓冲区中,这样在把 rdb 发给从库之后,紧接着会再把 replication buffer 的数据也发给从库,最终主从就保持了一致。
replication buffer
不是万能的补给剂
我们来看看 replication buffer 持续写入的时间有多长。
我们知道主从同步的时候,主库会执行 fork 来让子进程完成相应地工作,因此子进程从开始执行 bgsave 到执行完毕这期间,变更是要写入 replication buffer 的。
rdb 生成好之后,需要把它发送给从库,这个网络传输是不是也需要耗点时间,这期间也是要写入 replication buffer 的。
After receiving the rdb, the slave library needs to apply the rdb to the memory. During this period, the slave library is blocked and cannot provide services, so the replication buffer also needs to be written during this period.
replication buffer Since it is a buffer, its size is limited. If one of the above three steps takes a long time, it will cause the replication buffer to grow rapidly (provided that There are normal writes), when the replication buffer exceeds the limit, the connection between the main library and the slave library will be disconnected. After the disconnection, if the slave library is connected again, replication will be restarted, and then the same long process will be repeated. Therefore, the size of the replication buffer is still very critical. Generally, it needs to be comprehensively judged based on factors such as the writing speed, the amount of writing per second, and the speed of network transmission.
What should I do if the slave database network is not good and the master database is disconnected?
Normally speaking, as long as the connection between the master and the slave is established, subsequent changes to the master library can be directly sent to the slave library for direct playback from the slave library, but we cannot guarantee that the network environment is It is 100% smooth, so the disconnection issue between the slave database and the master database must also be considered.
It should be that before redis2.8, as long as the slave database was disconnected, even for a short time, when the slave database was connected again later, the main database would directly and brainlessly perform full synchronization. In version 2.8 and later, incremental replication is supported. The principle of incremental replication is that there must be a buffer to save the record of changes. This buffer here is called repl_backlog_buffer
. This buffer is logically It is a ring buffer. When it is full, it will be overwritten from the beginning, so there is also a size limit. When the slave library reconnects, the slave library will tell the main library: "I have copied to the xx location." After the main library receives the message from the slave library, it starts to check whether the data at the xx location is still in the repl_backlog_buffer. If so, , just send the data after xx to the slave library. If it is not there, there is nothing you can do and you can only perform full synchronization again.
Need a manager
In the master-slave mode, if the master library hangs, we can upgrade a slave library to the master library, but this process is manual Yes, relying on manpower to operate cannot minimize losses. A set of automatic management and election mechanisms are still needed. This is Sentinel. Sentinel itself is also a service, but it does not process data reading. It is just written. It is only responsible for managing all redis instances. The sentinel will communicate with each redis at regular intervals (ping operation). Each redis instance can express its position as long as it responds in time within the specified time. Of course, the Sentinel itself may be down or the network is unavailable, so generally the Sentinel will also build a Sentinel cluster. It is best to have an odd number of clusters, such as 3 or 5. The purpose of the odd number is mainly for elections. (The minority obeys the majority).
When a sentinel does not receive pong in time after initiating a ping, the redis instance will be marked offline. At this time, it is still not really offline. At this time, other sentinels will also determine the current Is this sentinel really offline? When most sentinels determine that this redis is offline, they will kick it out of the cluster. If it is a slave database that is offline, then it is okay. Just kick it out directly. , if the main database needs to trigger an election, the election is not a blind election, it must be to select the most suitable one to always act as the new main database. This library that is most suitable to serve as the main library is generally determined according to the following priorities:
Weight. Each slave library can actually set a weight. The slave library with a higher weight will The progress of copying is given priority
. The progress of copying from each slave library may be different. The one with the smallest data gap between the current and the main library is given priority
The ID of the service. In fact, each redis instance has its own ID. If the above conditions are the same, then the library with the smallest ID will be selected to serve as the main library
Stronger horizontal scalability
The master-slave mode solves the problem of single point of failure. At the same time, the read-write separation technology makes the application support stronger. The sentinel mode can automatically supervise the cluster and realize automatic selection. Main, the ability to automatically eliminate faulty nodes.
Normally speaking, as long as the reading pressure increases, we can add slave libraries to alleviate it. But what if the pressure on the main library is very high? This brings us to the sharding technology that we will talk about next. We only need to cut the main library into several pieces and deploy them to different machines. This sharding is the slot concept in redis. When sharding, redis will be divided into 0~16383 by default, which is a total of 16384 slots, and then these slots will be evenly distributed to each sharding node. It can play the role of load balancing. Which slot should each key be assigned to? The main thing is to first use CRC16 to get a 16-bit number, and then use this number modulo 16384:
crc16(key)%16384
然后客户端会缓存槽信息,这样每当一个 key 到来时,只要通过计算就知道该发给哪个实例来处理来了。但是客户端缓存的槽信息并不是一成不变的,比如在增加实例的时候,这时候会导致重新分片,那么原来客户端缓存的信息就会不准确,一般这时候会发生两个常见的错误,严格来说也不是错误,更像一种信息,一个叫做MOVED
,一个叫做ASK
。moved的意思就说,原来是实例A负责的数据,现在被迁移到了实例B,MOVED 代表的是迁移完成的,但是 ASK 代表的是正在迁移过程中,比如原来是实例A负责的部分数据,现在被迁移到了实例B,剩下的还在等待迁移中,当数据迁移完毕之后 ASK 就会变成 MOVED,然后客户端收到 MOVED 信息之后就会再次更新下本地缓存,这样下次就不会出现这两个错误了。
The above is the detailed content of What are the various data types and cluster-related knowledge in redis?. For more information, please follow other related articles on the PHP Chinese website!

Redis goes beyond SQL databases because of its high performance and flexibility. 1) Redis achieves extremely fast read and write speed through memory storage. 2) It supports a variety of data structures, such as lists and collections, suitable for complex data processing. 3) Single-threaded model simplifies development, but high concurrency may become a bottleneck.

Redis is superior to traditional databases in high concurrency and low latency scenarios, but is not suitable for complex queries and transaction processing. 1.Redis uses memory storage, fast read and write speed, suitable for high concurrency and low latency requirements. 2. Traditional databases are based on disk, support complex queries and transaction processing, and have strong data consistency and persistence. 3. Redis is suitable as a supplement or substitute for traditional databases, but it needs to be selected according to specific business needs.

Redisisahigh-performancein-memorydatastructurestorethatexcelsinspeedandversatility.1)Itsupportsvariousdatastructureslikestrings,lists,andsets.2)Redisisanin-memorydatabasewithpersistenceoptions,ensuringfastperformanceanddatasafety.3)Itoffersatomicoper

Redis is primarily a database, but it is more than just a database. 1. As a database, Redis supports persistence and is suitable for high-performance needs. 2. As a cache, Redis improves application response speed. 3. As a message broker, Redis supports publish-subscribe mode, suitable for real-time communication.

Redisisamultifacetedtoolthatservesasadatabase,server,andmore.Itfunctionsasanin-memorydatastructurestore,supportsvariousdatastructures,andcanbeusedasacache,messagebroker,sessionstorage,andfordistributedlocking.

Redisisanopen-source,in-memorydatastructurestoreusedasadatabase,cache,andmessagebroker,excellinginspeedandversatility.Itiswidelyusedforcaching,real-timeanalytics,sessionmanagement,andleaderboardsduetoitssupportforvariousdatastructuresandfastdataacces

Redis is an open source memory data structure storage used as a database, cache and message broker, suitable for scenarios where fast response and high concurrency are required. 1.Redis uses memory to store data and provides microsecond read and write speed. 2. It supports a variety of data structures, such as strings, lists, collections, etc. 3. Redis realizes data persistence through RDB and AOF mechanisms. 4. Use single-threaded model and multiplexing technology to handle requests efficiently. 5. Performance optimization strategies include LRU algorithm and cluster mode.

Redis's functions mainly include cache, session management and other functions: 1) The cache function stores data through memory to improve reading speed, and is suitable for high-frequency access scenarios such as e-commerce websites; 2) The session management function shares session data in a distributed system and automatically cleans it through an expiration time mechanism; 3) Other functions such as publish-subscribe mode, distributed locks and counters, suitable for real-time message push and multi-threaded systems and other scenarios.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Chinese version
Chinese version, very easy to use

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
