search
HomeDatabaseRedisHow to implement Redis using HyperLogLog

1. Overview

Redis added the HyperLogLog data structure in version 2.8.9, which is used for cardinality statistics. The advantage is that when the number of input elements is very large, the space required to calculate the cardinality is relatively small. And generally relatively constant.

In Redis, each HyperLogLog key only costs 12 KB of memory to calculate the cardinality of nearly 2^64 different elements. This is in sharp contrast to the calculation of cardinality, where a collection with more elements consumes more memory. However, because HyperLogLog only calculates the cardinality based on the input elements and does not store the input elements themselves, HyperLogLog cannot return individual elements of the input like a collection.

2. What is the cardinality?

For example, if the data set is {1, 3, 5, 7, 5, 7, 8}, then the cardinality set of this data set is {1, 3, 5 ,7, 8}, the cardinality (non-repeating elements) is 5. Cardinality estimation is to quickly calculate the cardinality within the acceptable error range.

3. Commands

Currently, only three commands, PFADD, PFCOUNT and PFMERGE, are supported by HyperLogLog. Let’s introduce them one by one first.

3.1 PFADD

Earliest available version: 2.8.9. Time complexity: O(1).

The PFADD command can add elements (multiple elements can be specified) to the HyperLogLog data structure and store them in the key specified by the first parameter key. Returns 1 if the cardinality estimate (number of elements evaluated) has changed, otherwise returns 0, i.e. to confirm whether the cardinality estimate has changed after executing the command. If the specified key does not exist, an empty HyperLogLog data structure is created (i.e., a Redis String with the specified string length and encoding). It is also possible to call the command without specifying an element parameter and only specifying the key. If the key exists, do nothing and return 0; if the key does not exist, a new HyperLogLog data node is created and 1 is returned. Essentially it just generates a new HyperLogLog data structure without storing any elements.

(1) Syntax format:

PFADD key element [element ...]

(2) Return value:

Integer type, if at least one element is added, 1 is returned, otherwise 0 is returned.

(3) Example:

127.0.0.1:6379> PFADD hll a b c d e f g
(integer) 1
127.0.0.1:6379> pfcount hll
(integer) 7

3.2 PFCOUNT

Earliest available version: 2.8.9. Time complexity: O(1). For multiple relatively large keys, the time complexity is O(N).

Use the PFCOUNT command to get a HyperLogLog estimated cardinality value (that is, the number of elements). This command returns 0 if the key does not exist, otherwise it returns an estimate of the key's cardinality. For multiple keys, returned is a cardinality estimate for the union of multiple HyperLogLogs, calculated by merging multiple HyperLogLogs into a temporary HyperLogLog. Using a minimal and consistent amount of memory, HyperLogLog can count the number of unique elements of a collection. Each HyperLogLog uses only 12K plus a few bytes of the key itself.

(1) Syntax format:

PFCOUNT key [key ...]

(2) Return value:

Integer, returns the cardinality estimate of the specified HyperLogLog. If there are multiple HyperLogLogs, the union is returned. Cardinality estimate.

(3) Example:

127.0.0.1:6379> PFADD hll foo bar zap
(integer) 1
127.0.0.1:6379> PFADD hll zap zap zap
(integer) 0
127.0.0.1:6379> PFADD hll foo bar
(integer) 0
127.0.0.1:6379> PFCOUNT hll
(integer) 3
127.0.0.1:6379> PFADD some-other-hll 1 2 3
(integer) 1
127.0.0.1:6379> PFCOUNT some-other-hll
(integer) 3
127.0.0.1:6379> PFCOUNT hll some-other-hll
(integer) 6

(4) Limitation:

The results returned by HyperLogLog are not accurate, and the error rate is about 0.81%.

Using this command will change HyperLogLog and use 8 bytes to store the last calculated base. So, technically speaking, PFCOUNT is a write command.

(5) Performance issues

Even though it theoretically takes a long time to process an intensive HyperLogLog, the PFCOUNT command still has high performance when only one key is specified. This is because PFCOUNT caches the base of the last calculation, and this base does not change all the time, because the PFADD command does not update the register in most cases. Therefore, the effect of hundreds of requests per second can be achieved.

When using the PFCOUNT command to process multiple keys, HyperLogLog will be merged. This step is very time-consuming. More importantly, the calculated cardinality of the union cannot be cached. When using multiple keys, the execution of PFCOUNT can take some time (usually on the order of milliseconds), so overuse is not recommended.

It should be noted that the single-key and multi-key execution semantics of this command are different and have different performance. Excessive use of multi-key execution semantics is not recommended.

3.3 PFMERGE

Earliest available version: 2.8.9. Time complexity: O(N), N is the number of HyperLogLogs to be merged.

Multiple HyperLogLogs can be merged into one HyperLogLog through the PFMERGE command. The cardinality estimate of the merged HyperLogLog is calculated by taking the union of all given HyperLogLogs. The calculated result is saved to the specified key.

Syntax format:

PFMERGE destkey sourcekey [sourcekey ...]

Return value:

Return OK.

Example:

127.0.0.1:6379> PFADD hll1 foo bar zap a
(integer) 1
127.0.0.1:6379> PFADD hll2 a b c foo
(integer) 1
127.0.0.1:6379> PFMERGE hll3 hll1 hll2
OK
127.0.0.1:6379> PFCOUNT hll3
(integer) 6

The above is the detailed content of How to implement Redis using HyperLogLog. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
Redis: Identifying Its Primary FunctionRedis: Identifying Its Primary FunctionApr 12, 2025 am 12:01 AM

The core function of Redis is a high-performance in-memory data storage and processing system. 1) High-speed data access: Redis stores data in memory and provides microsecond-level read and write speed. 2) Rich data structure: supports strings, lists, collections, etc., and adapts to a variety of application scenarios. 3) Persistence: Persist data to disk through RDB and AOF. 4) Publish subscription: Can be used in message queues or real-time communication systems.

Redis: A Guide to Popular Data StructuresRedis: A Guide to Popular Data StructuresApr 11, 2025 am 12:04 AM

Redis supports a variety of data structures, including: 1. String, suitable for storing single-value data; 2. List, suitable for queues and stacks; 3. Set, used for storing non-duplicate data; 4. Ordered Set, suitable for ranking lists and priority queues; 5. Hash table, suitable for storing object or structured data.

How to implement redis counterHow to implement redis counterApr 10, 2025 pm 10:21 PM

Redis counter is a mechanism that uses Redis key-value pair storage to implement counting operations, including the following steps: creating counter keys, increasing counts, decreasing counts, resetting counts, and obtaining counts. The advantages of Redis counters include fast speed, high concurrency, durability and simplicity and ease of use. It can be used in scenarios such as user access counting, real-time metric tracking, game scores and rankings, and order processing counting.

How to use the redis command lineHow to use the redis command lineApr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to build the redis cluster modeHow to build the redis cluster modeApr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to read redis queueHow to read redis queueApr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to use redis cluster zsetHow to use redis cluster zsetApr 10, 2025 pm 10:09 PM

Use of zset in Redis cluster: zset is an ordered collection that associates elements with scores. Sharding strategy: a. Hash sharding: Distribute the hash value according to the zset key. b. Range sharding: divide into ranges according to element scores, and assign each range to different nodes. Read and write operations: a. Read operations: If the zset key belongs to the shard of the current node, it will be processed locally; otherwise, it will be routed to the corresponding shard. b. Write operation: Always routed to shards holding the zset key.

How to clear redis dataHow to clear redis dataApr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.