search
HomeDatabaseRedisA brief analysis of HyperLogLog for Redis data type learning

A brief analysis of HyperLogLog for Redis data type learning

Jan 21, 2022 am 10:00 AM
hyperloglogredistype of data

This article will take you to understand the HyperLogLog in the Redis data type, which is usually used to count the number of unique elements in a collection. I hope it will be helpful to you!

A brief analysis of HyperLogLog for Redis data type learning

Today is Friday, you are happily fishing, and the product manager sends you a requirements document via email. The demand is probably: the company needs to count the website's daily visitor IPs, and this statistics is a long-term behavior, ranging from a few months to a few years.

After reading the requirements, you will feel that this is so easy. You can easily implement this function using the collection type of Redis: generate a collection type key every day, use SADD to store the daily visitor IP, and use the SCARD command to easily Get the number of visitor IPs per day.

You quickly finished typing the code and passed the test, and the function was online. After going online and running for a period of time, you will find that the server where Redis is located starts to alarm. The reason is that the memory usage of some keys is too large. You took a look and found that these keys are all set keys that store visitor IPs. Only then did you pat your head, knowing that you had dug a big hole for yourself.

Assume that storing an IP address in IPv4 format requires up to 15 bytes and that the website has up to 1 million visitors per day. These set keys will use 0.45 GB of memory per month and 5.4 GB of memory per year. This is only an estimate of the IPv4 format. If the IPv6 format will occupy more memory. Although the time complexity of SADD and SCARD is O(1), their memory consumption is unacceptable.

You browsed the official website of Redis and found that Redis also provides a data type HyperLogLog, which can not only meet the needs of the product but also occupy less memory. [Related recommendations: Redis Video Tutorial]

HyperLogLog Algorithm

HyperLogLog is a probabilistic algorithm created specifically for calculating the cardinality of a set. The approximate cardinality of a given set can be calculated.

The approximate cardinality is not the actual cardinality of the set. It may be a little smaller or larger than the actual cardinality, but the error between the estimated cardinality and the actual cardinality will be within a reasonable range. For those who do not require Very accurate statistics can be achieved using the HyperLogLog algorithm.

The advantage of HyperLogLog is that the memory required for calculating the approximate cardinality does not change due to the size of the set. No matter how many elements the set contains, the memory required for HyperLogLog to calculate is always fixed, and are very few.

Each HyperLogLog type of Redis only needs to use 12KB of memory space to count nearly: 264 elements, and the standard error of the algorithm is only 0.81%.

If you use the HyperLogLog type to implement the above functions, if there are 1 million visitors per day, it will only occupy 360KB of memory in one month.

PFADD

The PFADD command can be used to count one or more given set elements.

PFADD key element [element...]

Depending on whether the given element has been counted, the PFADD command may return 0 or 1:

  • If all the given elements have been counted, the PFADD command will return 0, indicating that the approximate cardinality calculated by HyperLogLog has not changed.
  • The PFADD command will return 1 if the approximate cardinality calculated by HyperLogLog changes due to the presence of at least one element in a given element that has not been previously counted.

For example:

redis> PFADD letters a b c -- 第一次添加
(integer) 1
redis> PFADD letters a     -- 第二次添加
(integer) 0

It is also possible if you only specify the key without specifying the element when calling this command. If the key exists, no operation will be performed. If it does not exist, A data structure will be created (returns 1).

PFCOUNT

The PFCOUNT command can be used to obtain the approximate cardinality calculated by HyperLogLog for the collection. If the given key does not exist, 0 will be returned.

PFCOUNT key [key...]

For example:

redis> PFCOUNT letters
(integer) 3

When multiple HyperLogLogs are passed to PFCOUNT, the PFCOUNT command will first The union of all HyperLogLogs is then returned and the approximate cardinality is returned.

redis> PFADD letters1 a b c
(integer) 1
redis> PFADD letters2 c d e
(integer) 1
redis> PFCOUNT letters1 letters2
(integer) 5

PFMERGE

The PFMERGE command can perform a union calculation on multiple HyperLogLogs, and then save the calculated union HyperLogLog to the specified key.

PFMERGE destKey sourceKey [sourceKey...]

If the specified key already exists, the PFMERGE command will overwrite the existing key.

redis> PFADD letters1 a b c
(integer) 1
redis> PFADD letters2 c d e
(integer) 1
redis> PFMERGE res letters1 letters2
OK
redis> PFCOUNT res
(integer) 5

You can see that the PFMERGE and PFCOUNT commands are very similar. In fact, the PFCOUNT command performs the following operations when calculating the approximate base of multiple HyperLogLogs:

  • Internally called The PFMERGE command calculates the union of all given HyperLogLogs and stores the union into a temporary HyperLogLog.

  • Execute the PFCOUNT command on the temporary HyperLogLog to get its approximate cardinality.

  • Delete the temporary HyperLogLog.

  • Return the resulting approximate base.

When the program needs to call the PFCOUNT command on multiple HyperLogLogs, and this call may be repeated multiple times, you can consider replacing this call with the corresponding PFMERGE command call: by combining the The calculation results are stored in the specified HyperLogLog instead of recalculating the union every time, and the program can minimize unnecessary union calculations.

Business Scenario

HyperLogLog’s features are very suitable for: counting (monthly, annual statistics), deduplication (spam SMS detection) and other scenarios.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of A brief analysis of HyperLogLog for Redis data type learning. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete
Redis: Exploring Its Features and FunctionalityRedis: Exploring Its Features and FunctionalityApr 19, 2025 am 12:04 AM

Redis stands out because of its high speed, versatility and rich data structure. 1) Redis supports data structures such as strings, lists, collections, hashs and ordered collections. 2) It stores data through memory and supports RDB and AOF persistence. 3) Starting from Redis 6.0, multi-threaded I/O operations have been introduced, which has improved performance in high concurrency scenarios.

Is Redis a SQL or NoSQL Database? The Answer ExplainedIs Redis a SQL or NoSQL Database? The Answer ExplainedApr 18, 2025 am 12:11 AM

RedisisclassifiedasaNoSQLdatabasebecauseitusesakey-valuedatamodelinsteadofthetraditionalrelationaldatabasemodel.Itoffersspeedandflexibility,makingitidealforreal-timeapplicationsandcaching,butitmaynotbesuitableforscenariosrequiringstrictdataintegrityo

Redis: Improving Application Performance and ScalabilityRedis: Improving Application Performance and ScalabilityApr 17, 2025 am 12:16 AM

Redis improves application performance and scalability by caching data, implementing distributed locking and data persistence. 1) Cache data: Use Redis to cache frequently accessed data to improve data access speed. 2) Distributed lock: Use Redis to implement distributed locks to ensure the security of operation in a distributed environment. 3) Data persistence: Ensure data security through RDB and AOF mechanisms to prevent data loss.

Redis: Exploring Its Data Model and StructureRedis: Exploring Its Data Model and StructureApr 16, 2025 am 12:09 AM

Redis's data model and structure include five main types: 1. String: used to store text or binary data, and supports atomic operations. 2. List: Ordered elements collection, suitable for queues and stacks. 3. Set: Unordered unique elements set, supporting set operation. 4. Ordered Set (SortedSet): A unique set of elements with scores, suitable for rankings. 5. Hash table (Hash): a collection of key-value pairs, suitable for storing objects.

Redis: Classifying Its Database ApproachRedis: Classifying Its Database ApproachApr 15, 2025 am 12:06 AM

Redis's database methods include in-memory databases and key-value storage. 1) Redis stores data in memory, and reads and writes fast. 2) It uses key-value pairs to store data, supports complex data structures such as lists, collections, hash tables and ordered collections, suitable for caches and NoSQL databases.

Why Use Redis? Benefits and AdvantagesWhy Use Redis? Benefits and AdvantagesApr 14, 2025 am 12:07 AM

Redis is a powerful database solution because it provides fast performance, rich data structures, high availability and scalability, persistence capabilities, and a wide range of ecosystem support. 1) Extremely fast performance: Redis's data is stored in memory and has extremely fast read and write speeds, suitable for high concurrency and low latency applications. 2) Rich data structure: supports multiple data types, such as lists, collections, etc., which are suitable for a variety of scenarios. 3) High availability and scalability: supports master-slave replication and cluster mode to achieve high availability and horizontal scalability. 4) Persistence and data security: Data persistence is achieved through RDB and AOF to ensure data integrity and reliability. 5) Wide ecosystem and community support: with a huge ecosystem and active community,

Understanding NoSQL: Key Features of RedisUnderstanding NoSQL: Key Features of RedisApr 13, 2025 am 12:17 AM

Key features of Redis include speed, flexibility and rich data structure support. 1) Speed: Redis is an in-memory database, and read and write operations are almost instantaneous, suitable for cache and session management. 2) Flexibility: Supports multiple data structures, such as strings, lists, collections, etc., which are suitable for complex data processing. 3) Data structure support: provides strings, lists, collections, hash tables, etc., which are suitable for different business needs.

Redis: Identifying Its Primary FunctionRedis: Identifying Its Primary FunctionApr 12, 2025 am 12:01 AM

The core function of Redis is a high-performance in-memory data storage and processing system. 1) High-speed data access: Redis stores data in memory and provides microsecond-level read and write speed. 2) Rich data structure: supports strings, lists, collections, etc., and adapts to a variety of application scenarios. 3) Persistence: Persist data to disk through RDB and AOF. 4) Publish subscription: Can be used in message queues or real-time communication systems.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.