How to implement Redis using HyperLogLog-Redis-php.cn

Home

Database

Redis

How to implement Redis using HyperLogLog

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 26, 2023 pm 05:41 PM

redishyperloglog

1. Overview

Redis added the HyperLogLog data structure in version 2.8.9, which is used for cardinality statistics. The advantage is that when the number of input elements is very large, the space required to calculate the cardinality is relatively small. And generally relatively constant.

In Redis, each HyperLogLog key only costs 12 KB of memory to calculate the cardinality of nearly 2^64 different elements. This is in sharp contrast to the calculation of cardinality, where a collection with more elements consumes more memory. However, because HyperLogLog only calculates the cardinality based on the input elements and does not store the input elements themselves, HyperLogLog cannot return individual elements of the input like a collection.

2. What is the cardinality?

For example, if the data set is {1, 3, 5, 7, 5, 7, 8}, then the cardinality set of this data set is {1, 3, 5 ,7, 8}, the cardinality (non-repeating elements) is 5. Cardinality estimation is to quickly calculate the cardinality within the acceptable error range.

3. Commands

Currently, only three commands, PFADD, PFCOUNT and PFMERGE, are supported by HyperLogLog. Let’s introduce them one by one first.

3.1 PFADD

Earliest available version: 2.8.9. Time complexity: O(1).

The PFADD command can add elements (multiple elements can be specified) to the HyperLogLog data structure and store them in the key specified by the first parameter key. Returns 1 if the cardinality estimate (number of elements evaluated) has changed, otherwise returns 0, i.e. to confirm whether the cardinality estimate has changed after executing the command. If the specified key does not exist, an empty HyperLogLog data structure is created (i.e., a Redis String with the specified string length and encoding). It is also possible to call the command without specifying an element parameter and only specifying the key. If the key exists, do nothing and return 0; if the key does not exist, a new HyperLogLog data node is created and 1 is returned. Essentially it just generates a new HyperLogLog data structure without storing any elements.

(1) Syntax format:

PFADD key element [element ...]

(2) Return value:

Integer type, if at least one element is added, 1 is returned, otherwise 0 is returned.

(3) Example:

127.0.0.1:6379> PFADD hll a b c d e f g
(integer) 1
127.0.0.1:6379> pfcount hll
(integer) 7

3.2 PFCOUNT

Earliest available version: 2.8.9. Time complexity: O(1). For multiple relatively large keys, the time complexity is O(N).

Use the PFCOUNT command to get a HyperLogLog estimated cardinality value (that is, the number of elements). This command returns 0 if the key does not exist, otherwise it returns an estimate of the key's cardinality. For multiple keys, returned is a cardinality estimate for the union of multiple HyperLogLogs, calculated by merging multiple HyperLogLogs into a temporary HyperLogLog. Using a minimal and consistent amount of memory, HyperLogLog can count the number of unique elements of a collection. Each HyperLogLog uses only 12K plus a few bytes of the key itself.

(1) Syntax format:

PFCOUNT key [key ...]

(2) Return value:

Integer, returns the cardinality estimate of the specified HyperLogLog. If there are multiple HyperLogLogs, the union is returned. Cardinality estimate.

(3) Example:

127.0.0.1:6379> PFADD hll foo bar zap
(integer) 1
127.0.0.1:6379> PFADD hll zap zap zap
(integer) 0
127.0.0.1:6379> PFADD hll foo bar
(integer) 0
127.0.0.1:6379> PFCOUNT hll
(integer) 3
127.0.0.1:6379> PFADD some-other-hll 1 2 3
(integer) 1
127.0.0.1:6379> PFCOUNT some-other-hll
(integer) 3
127.0.0.1:6379> PFCOUNT hll some-other-hll
(integer) 6

(4) Limitation:

The results returned by HyperLogLog are not accurate, and the error rate is about 0.81%.

Using this command will change HyperLogLog and use 8 bytes to store the last calculated base. So, technically speaking, PFCOUNT is a write command.

(5) Performance issues

Even though it theoretically takes a long time to process an intensive HyperLogLog, the PFCOUNT command still has high performance when only one key is specified. This is because PFCOUNT caches the base of the last calculation, and this base does not change all the time, because the PFADD command does not update the register in most cases. Therefore, the effect of hundreds of requests per second can be achieved.

When using the PFCOUNT command to process multiple keys, HyperLogLog will be merged. This step is very time-consuming. More importantly, the calculated cardinality of the union cannot be cached. When using multiple keys, the execution of PFCOUNT can take some time (usually on the order of milliseconds), so overuse is not recommended.

It should be noted that the single-key and multi-key execution semantics of this command are different and have different performance. Excessive use of multi-key execution semantics is not recommended.

3.3 PFMERGE

Earliest available version: 2.8.9. Time complexity: O(N), N is the number of HyperLogLogs to be merged.

Multiple HyperLogLogs can be merged into one HyperLogLog through the PFMERGE command. The cardinality estimate of the merged HyperLogLog is calculated by taking the union of all given HyperLogLogs. The calculated result is saved to the specified key.

Syntax format:

PFMERGE destkey sourcekey [sourcekey ...]

Return value:

Return OK.

Example:

127.0.0.1:6379> PFADD hll1 foo bar zap a
(integer) 1
127.0.0.1:6379> PFADD hll2 a b c foo
(integer) 1
127.0.0.1:6379> PFMERGE hll3 hll1 hll2
OK
127.0.0.1:6379> PFCOUNT hll3
(integer) 6

The above is the detailed content of How to implement Redis using HyperLogLog. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Redis: Unveiling Its Purpose and Key ApplicationsMay 03, 2025 am 12:11 AM

Redisisanopen-source,in-memorydatastructurestoreusedasadatabase,cache,andmessagebroker,excellinginspeedandversatility.Itiswidelyusedforcaching,real-timeanalytics,sessionmanagement,andleaderboardsduetoitssupportforvariousdatastructuresandfastdataacces

Redis: A Guide to Key-Value Data StoresMay 02, 2025 am 12:10 AM

Redis is an open source memory data structure storage used as a database, cache and message broker, suitable for scenarios where fast response and high concurrency are required. 1.Redis uses memory to store data and provides microsecond read and write speed. 2. It supports a variety of data structures, such as strings, lists, collections, etc. 3. Redis realizes data persistence through RDB and AOF mechanisms. 4. Use single-threaded model and multiplexing technology to handle requests efficiently. 5. Performance optimization strategies include LRU algorithm and cluster mode.

Redis: Caching, Session Management, and MoreMay 01, 2025 am 12:03 AM

Redis's functions mainly include cache, session management and other functions: 1) The cache function stores data through memory to improve reading speed, and is suitable for high-frequency access scenarios such as e-commerce websites; 2) The session management function shares session data in a distributed system and automatically cleans it through an expiration time mechanism; 3) Other functions such as publish-subscribe mode, distributed locks and counters, suitable for real-time message push and multi-threaded systems and other scenarios.

Redis: Exploring Its Core Functionality and BenefitsApr 30, 2025 am 12:22 AM

Redis's core functions include memory storage and persistence mechanisms. 1) Memory storage provides extremely fast read and write speeds, suitable for high-performance applications. 2) Persistence ensures that data is not lost through RDB and AOF, and the choice is based on application needs.

Redis's Server-Side Operations: What It OffersApr 29, 2025 am 12:21 AM

Redis'sServer-SideOperationsofferFunctionsandTriggersforexecutingcomplexoperationsontheserver.1)FunctionsallowcustomoperationsinLua,JavaScript,orRedis'sscriptinglanguage,enhancingscalabilityandmaintenance.2)Triggersenableautomaticfunctionexecutionone

Redis: Database or Server? Demystifying the RoleApr 28, 2025 am 12:06 AM

Redisisbothadatabaseandaserver.1)Asadatabase,itusesin-memorystorageforfastaccess,idealforreal-timeapplicationsandcaching.2)Asaserver,itsupportspub/submessagingandLuascriptingforreal-timecommunicationandserver-sideoperations.

Redis: The Advantages of a NoSQL ApproachApr 27, 2025 am 12:09 AM

Redis is a NoSQL database that provides high performance and flexibility. 1) Store data through key-value pairs, suitable for processing large-scale data and high concurrency. 2) Memory storage and single-threaded models ensure fast read and write and atomicity. 3) Use RDB and AOF mechanisms to persist data, supporting high availability and scale-out.

Redis: Understanding Its Architecture and PurposeApr 26, 2025 am 12:11 AM

Redis is a memory data structure storage system, mainly used as a database, cache and message broker. Its core features include single-threaded model, I/O multiplexing, persistence mechanism, replication and clustering functions. Redis is commonly used in practical applications for caching, session storage, and message queues. It can significantly improve its performance by selecting the right data structure, using pipelines and transactions, and monitoring and tuning.

See all articles