Redis HyperLogLog

Redis added the HyperLogLog structure in version 2.8.9.

Redis HyperLogLog is an algorithm used for cardinality statistics. The advantage of HyperLogLog is that when the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed. , and very small.

In Redis, each HyperLogLog key only requires 12 KB of memory to calculate the basis of nearly 2^64 different elements. number. This is in sharp contrast to a collection that consumes more memory when calculating cardinality. The more elements there are, the more memory is consumed.

However, because HyperLogLog will only calculate the base based on the input element and will not store the input element itself, so HyperLogLog cannot return individual elements of the input like a collection.

What is the cardinality?

For example, if the data set is {1, 3, 5, 7, 5, 7, 8}, then the cardinality set of this data set is {1, 3 , 5 ,7, 8}, the cardinality (non-repeating elements) is 5. Cardinality estimation is to quickly calculate the cardinality within the acceptable error range.

Example

The following example demonstrates the working process of HyperLogLog:

redis 127.0.0.1:6379> PFADD w3ckey "redis"

1) (integer) 1

redis 127.0.0.1:6379> PFADD w3ckey "mongodb"

1) (integer) 1

redis 127.0.0.1:6379> PFADD w3ckey "mysql"

1) (integer) 1

redis 127.0.0.1:6379> PFCOUNT w3ckey

(integer) 3