How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates-Redis-php.cn

Home

Database

Redis

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

青灯夜游

Nov 09, 2021 am 10:03 AM

redisRemove duplicates

How to remove duplicates in Redis? The following article will introduce to you 4 methods of Redis deduplication. I hope it will be helpful to you!

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

This article mainly introduces the sharing of three methods for realizing unique counting in Redis. This article explains three methods based on SET, based on bit, and based on HyperLogLog. Friends can refer to

Unique counting is a very common feature in website systems. For example, a website needs to count the number of unique visitors (that is, UV) that visits every day. Counting problems are very common, but they can be very complicated to solve: first, the amount that needs to be counted may be very large, for example, a large site is visited by millions of people every day, and the amount of data is quite large; second, it is usually desirable to expand the dimension of counting. For example, in addition to daily UV, you also want to know weekly or monthly UV, which makes the calculation very complicated. [Related recommendation: Redis Video Tutorial]

In a system stored in a relational database, the method to achieve unique counting is select count(distinct ). It is very simple, but if The amount of data is large, and the execution of this statement is very slow. Another problem with using relational databases is that the performance of inserting data is not high.

Redis is easy to solve this kind of counting problem. It is faster and consumes less resources than relational databases. It even provides 3 different methods.

1. The set based on set

Redis is used to save a unique data collection. Through it, you can quickly determine whether an element exists in the collection, and you can also quickly Counts the number of elements in a set, and can merge sets into a new set. The commands involved are as follows:

Copy the code as follows:

SISMEMBER key member  # 判断 member 是否存在
SADD key member  # 往集合中加入 member
SCARD key   # 获取集合元素个数

The set-based method is simple and effective, with accurate counting, wide application and easy to understand. Its disadvantage is that it consumes a lot of resources (of course Much less than a relational database), if the number of elements is large (such as hundreds of millions), the memory consumption is terrible.

2. Bit based on bit

Redis can be used to implement counting that is more highly compressed than set memory. It uses a bit 1 or 0 to store whether an element is Information exists. For example, for the count of unique visitors to a website, user_id can be used as the offset of the bit. Set to 1 to indicate access. Using 1 MB of space can store the one-day access count of more than 8 million users. The commands involved are as follows: Copy the code as follows:

SETBIT key offset value  # 设置位信息
GETBIT key offset        # 获取位信息
BITCOUNT key [start end] # 计数
BITOP operation destkey key [key ...]  # 位图合并

The bit-based method consumes much less space than the set method, but it requires that the elements can be simply mapped to bit offsets, and the applicable scope is much narrower. In addition, it consumes a lot of space. Depends on the maximum offset, regardless of the count value. If the maximum offset is large, the memory consumption is also considerable.

3. Based on HyperLogLog

It is difficult to achieve accurate unique counting of extremely large amounts of data, but if it is just an approximation, there are many efficient algorithms in computing science , among which HyperLogLog Counting is a very famous algorithm. It can only use about 12 k of memory to achieve hundreds of millions of unique counts, and the error is controlled at about one percent. The commands involved are as follows: Copy the code as follows:

PFADD key element [element ...]  # 加入元素
PFCOUNT key [key ...]   # 计数

This counting method is really amazing. It involves some uniform distribution, random probability, Bernoulli distribution, etc. in statistics. I have not completely understood it. I am interested. You can delve into relevant articles.

The three unique counting methods provided by redis each have their own advantages and disadvantages, and can fully meet the counting requirements in different situations.

4. Based on bloomfilter

BloomFilter uses a data structure similar to a bitmap or a bit set to store data, and uses a bit array to concisely represent a set. And it can quickly determine whether an element already exists in this collection. Although BloomFilter is not 100% accurate, the error rate can be reduced by adjusting parameters, the number of Hash functions used, and the size of the bit array. This adjustment can completely reduce the error rate to close to 0. It can meet most scenarios.

If there is a set S = {x1, x2, … xn}, Bloom Filter uses k independent hash functions to map each element in the set to {1,…,m}. range. For any element, the number mapped to is used as the index of the corresponding bit array, and the bit will be set to 1. For example, element x1 is mapped to the number 8 by the hash function, then the 8th bit of the bit array will be set to 1. In the figure below, the set S has only two elements x and y, which are mapped by three hash functions respectively. The mapped positions are (0, 3, 6) and (4, 7, 10) respectively, and the corresponding bits will be set. is 1:

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

#Now if you want to determine whether another element is in this set, you only need to be mapped by these three hash functions to see if there is 0 in the corresponding position. Existence, if any, means that this element definitely does not exist in this set, otherwise it might exist.

Redis needs to install plug-ins to use Bloom filters: https://blog.csdn.net/u013030276/article/details/88350641.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete

Redis: Understanding Its Architecture and PurposeApr 26, 2025 am 12:11 AM

Redis is a memory data structure storage system, mainly used as a database, cache and message broker. Its core features include single-threaded model, I/O multiplexing, persistence mechanism, replication and clustering functions. Redis is commonly used in practical applications for caching, session storage, and message queues. It can significantly improve its performance by selecting the right data structure, using pipelines and transactions, and monitoring and tuning.

Redis vs. SQL Databases: Key DifferencesApr 25, 2025 am 12:02 AM

The main difference between Redis and SQL databases is that Redis is an in-memory database, suitable for high performance and flexibility requirements; SQL database is a relational database, suitable for complex queries and data consistency requirements. Specifically, 1) Redis provides high-speed data access and caching services, supports multiple data types, suitable for caching and real-time data processing; 2) SQL database manages data through a table structure, supports complex queries and transaction processing, and is suitable for scenarios such as e-commerce and financial systems that require data consistency.

Redis: How It Acts as a Data Store and ServiceApr 24, 2025 am 12:08 AM

Redisactsasbothadatastoreandaservice.1)Asadatastore,itusesin-memorystorageforfastoperations,supportingvariousdatastructureslikekey-valuepairsandsortedsets.2)Asaservice,itprovidesfunctionalitieslikepub/submessagingandLuascriptingforcomplexoperationsan

Redis vs. Other Databases: A Comparative AnalysisApr 23, 2025 am 12:16 AM

Compared with other databases, Redis has the following unique advantages: 1) extremely fast speed, and read and write operations are usually at the microsecond level; 2) supports rich data structures and operations; 3) flexible usage scenarios such as caches, counters and publish subscriptions. When choosing Redis or other databases, it depends on the specific needs and scenarios. Redis performs well in high-performance and low-latency applications.

Redis's Role: Exploring the Data Storage and Management CapabilitiesApr 22, 2025 am 12:10 AM

Redis plays a key role in data storage and management, and has become the core of modern applications through its multiple data structures and persistence mechanisms. 1) Redis supports data structures such as strings, lists, collections, ordered collections and hash tables, and is suitable for cache and complex business logic. 2) Through two persistence methods, RDB and AOF, Redis ensures reliable storage and rapid recovery of data.

Redis: Understanding NoSQL ConceptsApr 21, 2025 am 12:04 AM

Redis is a NoSQL database suitable for efficient storage and access of large-scale data. 1.Redis is an open source memory data structure storage system that supports multiple data structures. 2. It provides extremely fast read and write speeds, suitable for caching, session management, etc. 3.Redis supports persistence and ensures data security through RDB and AOF. 4. Usage examples include basic key-value pair operations and advanced collection deduplication functions. 5. Common errors include connection problems, data type mismatch and memory overflow, so you need to pay attention to debugging. 6. Performance optimization suggestions include selecting the appropriate data structure and setting up memory elimination strategies.

Redis: Real-World Use Cases and ExamplesApr 20, 2025 am 12:06 AM

The applications of Redis in the real world include: 1. As a cache system, accelerate database query, 2. To store the session data of web applications, 3. To implement real-time rankings, 4. To simplify message delivery as a message queue. Redis's versatility and high performance make it shine in these scenarios.

Redis: Exploring Its Features and FunctionalityApr 19, 2025 am 12:04 AM

Redis stands out because of its high speed, versatility and rich data structure. 1) Redis supports data structures such as strings, lists, collections, hashs and ordered collections. 2) It stores data through memory and supports RDB and AOF persistence. 3) Starting from Redis 6.0, multi-threaded I/O operations have been introduced, which has improved performance in high concurrency scenarios.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.