search
HomeDatabaseRedisHow to use the HyperLogLog data type in Redis

How to use the HyperLogLog data type in Redis

May 29, 2023 am 09:29 AM
redishyperloglog

1. Principle of HyperLogLog

Redis HyperLogLog uses a probability algorithm, the HyperLogLog algorithm, to estimate the cardinality. Using a set of hash functions and a bit array of length m, HyperLogLog is able to estimate the number of unique elements in a set.

In the HyperLogLog algorithm, each element is hashed, and after converting the hash value into binary, each element is scored according to the number of 1's in the binary string prefix. For example, if the hash value of an element is 01110100011, then the number of 1's in the prefix is ​​3, so in the HyperLogLog algorithm, the score of this element is 3.

After counting the scores of all elements, take the reciprocal of each score (1 / 2^n), then add these reciprocals and take the reciprocal, and you will get a cardinality estimate, which is HyperLogLog The estimation results of the algorithm.

The HyperLogLog algorithm trades off the size of the length m of the bit array, compromising the memory occupied by the data structure and the accuracy of the estimated value (i.e., the estimated error), and obtains the result between the space occupied by the data and the smaller degree of error. perfect balance.

In short, the core idea of ​​the HyperLogLog algorithm is based on hash functions and bit operations. By converting the hash value into a bit stream and counting the number of leading 0s, it can quickly estimate the unique value in a large data set. quantity. Using the hyperloglog algorithm, we are able to quickly identify duplicate web pages in very large datasets.

2. Usage steps:

Redis HyperLogLog is a data structure that can be used to estimate the number of elements in a collection. It can maintain massive amounts of data by using very little memory. It is more accurate than conventional estimation algorithms and very fast when processing large amounts of data.

A simple example, we can use HyperLogLog to calculate the number of independent IPs visiting the website. Specifically, you can follow the following steps:

  • First create a HyperLogLog data structure: PFADD hll:unique_ips 127.0.0.1

  • Add the ip for each access to the unique_ips data structure: PFADD hll:unique_ips 192.168.1.1

  • Get an approximation of the number of elements in the calculated collection: PFCOUNT hll:unique_ips

  • ##You can pass multiple HyperLogLog structures (such as by day or hour) to get a more accurate count.

It should be noted that although HyperLogLog can save a lot of memory, it is an estimation algorithm and the error range is not completely accurate. You should pay attention to its scope of application when using it in practice.

3. Example of using page views to implement request IP deduplication

How to use the HyperLogLog data type in Redis

4.Using Jedis client

 1. Add dependencies, Introduce jedis dependency:

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>3.6.0</version>
</dependency>

2. Create a Jedis object:

Jedis jedis = new Jedis("localhost");

3. Add elements to the HyperLogLog data structure:

jedis.pfadd("hll:unique_ips", "127.0.0.1");

4. Get the number of elements in the collection Approximate value:

Long count = jedis.pfcount("hll:unique_ips");
System.out.println(count);

5. A more accurate count can be obtained by merging multiple HyperLogLog structures. In Jedis, you can use the

PFMERGE command to merge the HyperLogLog data structure:

jedis.pfmerge("hll:unique_ips", "hll:unique_ips1", "hll:unique_ips2", "hll:unique_ips3");

5. Redission uses dependencies

 1. Create a RedissonClient object

Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
RedissonClient redisson = Redisson.create(config);

 2 .Create RHyperLogLog object

RHyperLogLog<String> uniqueIps = redisson.getHyperLogLog("hll:unique_ips");

 3.Add elements

uniqueIps.add("127.0.0.1");

 4.Get approximate quantity

long approximateCount = uniqueIps.count();
System.out.println(approximateCount);

 5.Merge multiple HyperLogLog objects

RHyperLogLog<String> uniqueIps1 = redisson.getHyperLogLog("hll:unique_ips1");
RHyperLogLog<String> uniqueIps2 = redisson.getHyperLogLog("hll:unique_ips2");
uniqueIps.mergeWith(uniqueIps1, uniqueIps2);

6 .What features and methods does HyperLogLog provide?

Features:

  • The accuracy is low, but it takes up very little memory.

  • Supports inserting new elements without double counting.

  • Provides instructions to optimize memory usage and counting accuracy. For example, PFADD, PFCOUNT, PFMERGE and other instructions.

  • Be able to estimate the number of different elements in a data set, that is, the cardinality of the set.

  • Supports merging operations on multiple HyperLogLog objects to obtain an approximation of the total cardinality of these collections.

Commonly used methods in HyperLogLog:

  • PFADD key element [element ...]: Add one or more elements to the HyperLogLog structure.

  • PFCOUNT key [key ...]: Get the cardinality estimate of one or more HyperLogLog structures.

  • PFMERGE destkey sourcekey [sourcekey ...]: Merge one or more HyperLogLog structures into a target structure.

  • PFSELFTEST [numtests]: Test HyperLogLog valuation performance and accuracy (only for Redis4.0 version)

It should be noted that, Although HyperLogLog can save a lot of memory, it is still an estimation algorithm, the error range is not completely accurate, and it has a certain computational cost. Depending on the actual application, you need to consider whether to use HyperLogLog or other data structures to estimate the number of elements.

7. Summary of usage scenarios:

The main function of Redis using HyperLogLog is to perform deduplication counting in the case of large data streams (view, IP, city).

Specifically, the following are some scenarios where Redis HyperLogLog is used for deduplication and counting:

  • Count Page Views - In web applications, HyperLogLog can be used to count how many unique visitors there are for each page. Use HyperLogLog technology to calculate the average number of visits to this page across different time periods.

  • HyperLogLog has significant utility in analyzing the number of users in big data collections. A probability-based data structure is particularly effective when dealing with data sets such as unique user IDs. HyperLogLog only saves a limited number of hash values ​​after hashing and is able to deduce the size of the data set.

  • Count advertising clicks - For advertising analysis on a website or application, HyperLogLog can be used to capture the number of effective clicks, that is, the number of distinct or unique clicks.

The above is the detailed content of How to use the HyperLogLog data type in Redis. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
Redis's Role: Exploring the Data Storage and Management CapabilitiesRedis's Role: Exploring the Data Storage and Management CapabilitiesApr 22, 2025 am 12:10 AM

Redis plays a key role in data storage and management, and has become the core of modern applications through its multiple data structures and persistence mechanisms. 1) Redis supports data structures such as strings, lists, collections, ordered collections and hash tables, and is suitable for cache and complex business logic. 2) Through two persistence methods, RDB and AOF, Redis ensures reliable storage and rapid recovery of data.

Redis: Understanding NoSQL ConceptsRedis: Understanding NoSQL ConceptsApr 21, 2025 am 12:04 AM

Redis is a NoSQL database suitable for efficient storage and access of large-scale data. 1.Redis is an open source memory data structure storage system that supports multiple data structures. 2. It provides extremely fast read and write speeds, suitable for caching, session management, etc. 3.Redis supports persistence and ensures data security through RDB and AOF. 4. Usage examples include basic key-value pair operations and advanced collection deduplication functions. 5. Common errors include connection problems, data type mismatch and memory overflow, so you need to pay attention to debugging. 6. Performance optimization suggestions include selecting the appropriate data structure and setting up memory elimination strategies.

Redis: Real-World Use Cases and ExamplesRedis: Real-World Use Cases and ExamplesApr 20, 2025 am 12:06 AM

The applications of Redis in the real world include: 1. As a cache system, accelerate database query, 2. To store the session data of web applications, 3. To implement real-time rankings, 4. To simplify message delivery as a message queue. Redis's versatility and high performance make it shine in these scenarios.

Redis: Exploring Its Features and FunctionalityRedis: Exploring Its Features and FunctionalityApr 19, 2025 am 12:04 AM

Redis stands out because of its high speed, versatility and rich data structure. 1) Redis supports data structures such as strings, lists, collections, hashs and ordered collections. 2) It stores data through memory and supports RDB and AOF persistence. 3) Starting from Redis 6.0, multi-threaded I/O operations have been introduced, which has improved performance in high concurrency scenarios.

Is Redis a SQL or NoSQL Database? The Answer ExplainedIs Redis a SQL or NoSQL Database? The Answer ExplainedApr 18, 2025 am 12:11 AM

RedisisclassifiedasaNoSQLdatabasebecauseitusesakey-valuedatamodelinsteadofthetraditionalrelationaldatabasemodel.Itoffersspeedandflexibility,makingitidealforreal-timeapplicationsandcaching,butitmaynotbesuitableforscenariosrequiringstrictdataintegrityo

Redis: Improving Application Performance and ScalabilityRedis: Improving Application Performance and ScalabilityApr 17, 2025 am 12:16 AM

Redis improves application performance and scalability by caching data, implementing distributed locking and data persistence. 1) Cache data: Use Redis to cache frequently accessed data to improve data access speed. 2) Distributed lock: Use Redis to implement distributed locks to ensure the security of operation in a distributed environment. 3) Data persistence: Ensure data security through RDB and AOF mechanisms to prevent data loss.

Redis: Exploring Its Data Model and StructureRedis: Exploring Its Data Model and StructureApr 16, 2025 am 12:09 AM

Redis's data model and structure include five main types: 1. String: used to store text or binary data, and supports atomic operations. 2. List: Ordered elements collection, suitable for queues and stacks. 3. Set: Unordered unique elements set, supporting set operation. 4. Ordered Set (SortedSet): A unique set of elements with scores, suitable for rankings. 5. Hash table (Hash): a collection of key-value pairs, suitable for storing objects.

Redis: Classifying Its Database ApproachRedis: Classifying Its Database ApproachApr 15, 2025 am 12:06 AM

Redis's database methods include in-memory databases and key-value storage. 1) Redis stores data in memory, and reads and writes fast. 2) It uses key-value pairs to store data, supports complex data structures such as lists, collections, hash tables and ordered collections, suitable for caches and NoSQL databases.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.