search
HomeDatabaseRedisWhat is a bloom filter? How to use it in Redis?

What is a bloom filter? How to use it in Redis?

Jun 24, 2021 pm 07:10 PM
redisbloom filter

Bloom filter is a magical data structure. This article will give you an in-depth understanding of Bloom filter and introduce the method of using Bloom filter in Redis.

What is a bloom filter? How to use it in Redis?

What is "Bloom filter"

The Bloom filter is a magical data structure,can Used to determine whether an element is in a collection. A very commonly used function is to remove duplicates. A common requirement among crawlers: There are thousands of target website URLs. How to determine whether a crawler has favored a certain URL? To put it simply, every time the crawler collects a URL, it can store the URL in the database. Every time a new URL comes over, it will go to the database to query whether it has been accessed before. [Related recommendations: Redis Video Tutorial]

select id from table where url = 'https://jaychen.cc'

But as the crawler crawls more and more URLs, the database must be accessed once before each request, and for this kind of string SQL query efficiency is not high. In addition to the database, using the set structure of Redis can also meet this requirement, and its performance is better than that of the database. But Redis also has a problem: it consumes too much memory. At this time, the Bloom filter appears very horizontally: let me answer this question.

Compared with databases and Redis, using Bloom filters can effectively avoid performance and memory usage problems.

The Bloom filter is essentially a bit array. A bit array means that each element of the array only occupies 1 bit. Each element can only be 0 or 1. In this way, applying for a bit array of 10000 elements only takes up 10000 / 8 = 1250 B of space. In addition to a bit array, the Bloom filter also has K hash functions. When an element is added to the Bloom filter, the following operations will be performed:

  • Use K hash functions to calculate the element value K times to obtain K hash values.
  • According to the obtained hash value, set the corresponding subscript value to 1 in the bit array.

For example, assume that the Bloom filter has 3 hash functions: f1, f2, f3 and a bit array arr. Now we need to insert https://jaychen.cc into the Bloom filter:

  • Perform three hash calculations on the value to get three values ​​n1, n2, n3.
  • Set the three elements arr[n1], arr[n2], arr[3] in the bit array to 1.

When you want to determine whether a value is in the Bloom filter, perform a hash calculation on the element again. After getting the value, determine whether each element in the bit array is 1. If the values ​​are all 1, then it means that this value is in the Bloom filter. If there is a value that is not 1, it means that the element is not in the Bloom filter.

If you can’t understand the text, please look at the explanation of the soul painter’s picture below

What is a bloom filter? How to use it in Redis?

After reading the above explanation, you will definitely come up with a Problem: When more elements are inserted, the more positions in the bit array are set to 1. When an element is not in the Bloom filter, after hash calculation, the value obtained is queried in the bit array, and there is Perhaps these locations are also set to 1. Such an object that does not exist in the Bloom filter may also be misjudged as being in the Bloom filter. But if the Bloom filter determines that an element is not in the Bloom filter, then this value must not be in the Bloom filter. To put it simply:

  • If the Bloom filter says that a certain element is present, it may be misjudged.
  • The Bloom filter says that an element is not there, then it must not be there.

The defect of this Bloom filter is put into the requirements of the crawler above. There may be some unvisited URLs that may be misjudged as visited, but if they are visited URLs, they must be It will not be mistakenly judged as not visited.

Bloom filters in Redis

redis added the module function in version 4.0. Bloom filters can be added to redis in the form of modules. Therefore, if you use redis 4.0 or above, you can use the bloom filter in redis by loading module. But this is not the simplest way. You can use docker to experience bloom filters directly in redis.

> docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom
> docker exec -it bloomfilter redis-cli

redis Bloom filter mainly has two commands:

  • bf.add Add elements to the Bloom filter: bf. add urls https://jaychen.cc.
  • bf.exists Determine whether an element is in the filter: bf.exists urls https://jaychen.cc.

As mentioned above, there are misjudgments in the Bloom filter. There are two values ​​​​in redis that determine the accuracy of the Bloom filter:

  • error_rate:允许布隆过滤器的错误率,这个值越低过滤器的位数组的大小越大,占用空间也就越大。
  • initial_size:布隆过滤器可以储存的元素个数,当实际存储的元素个数超过这个值之后,过滤器的准确率会下降。

redis 中有一个命令可以来设置这两个值:

bf.reserve urls 0.01 100

三个参数的含义:

  • 第一个值是过滤器的名字。
  • 第二个值为 error_rate 的值。
  • 第三个值为 initial_size 的值。

使用这个命令要注意一点:执行这个命令之前过滤器的名字应该不存在,如果执行之前就存在会报错:(error) ERR item exists

更多编程相关知识,请访问:编程入门!!

The above is the detailed content of What is a bloom filter? How to use it in Redis?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete
Redis: Beyond SQL - The NoSQL PerspectiveRedis: Beyond SQL - The NoSQL PerspectiveMay 08, 2025 am 12:25 AM

Redis goes beyond SQL databases because of its high performance and flexibility. 1) Redis achieves extremely fast read and write speed through memory storage. 2) It supports a variety of data structures, such as lists and collections, suitable for complex data processing. 3) Single-threaded model simplifies development, but high concurrency may become a bottleneck.

Redis: A Comparison to Traditional Database ServersRedis: A Comparison to Traditional Database ServersMay 07, 2025 am 12:09 AM

Redis is superior to traditional databases in high concurrency and low latency scenarios, but is not suitable for complex queries and transaction processing. 1.Redis uses memory storage, fast read and write speed, suitable for high concurrency and low latency requirements. 2. Traditional databases are based on disk, support complex queries and transaction processing, and have strong data consistency and persistence. 3. Redis is suitable as a supplement or substitute for traditional databases, but it needs to be selected according to specific business needs.

Redis: Introduction to a Powerful In-Memory Data StoreRedis: Introduction to a Powerful In-Memory Data StoreMay 06, 2025 am 12:08 AM

Redisisahigh-performancein-memorydatastructurestorethatexcelsinspeedandversatility.1)Itsupportsvariousdatastructureslikestrings,lists,andsets.2)Redisisanin-memorydatabasewithpersistenceoptions,ensuringfastperformanceanddatasafety.3)Itoffersatomicoper

Is Redis Primarily a Database?Is Redis Primarily a Database?May 05, 2025 am 12:07 AM

Redis is primarily a database, but it is more than just a database. 1. As a database, Redis supports persistence and is suitable for high-performance needs. 2. As a cache, Redis improves application response speed. 3. As a message broker, Redis supports publish-subscribe mode, suitable for real-time communication.

Redis: Database, Server, or Something Else?Redis: Database, Server, or Something Else?May 04, 2025 am 12:08 AM

Redisisamultifacetedtoolthatservesasadatabase,server,andmore.Itfunctionsasanin-memorydatastructurestore,supportsvariousdatastructures,andcanbeusedasacache,messagebroker,sessionstorage,andfordistributedlocking.

Redis: Unveiling Its Purpose and Key ApplicationsRedis: Unveiling Its Purpose and Key ApplicationsMay 03, 2025 am 12:11 AM

Redisisanopen-source,in-memorydatastructurestoreusedasadatabase,cache,andmessagebroker,excellinginspeedandversatility.Itiswidelyusedforcaching,real-timeanalytics,sessionmanagement,andleaderboardsduetoitssupportforvariousdatastructuresandfastdataacces

Redis: A Guide to Key-Value Data StoresRedis: A Guide to Key-Value Data StoresMay 02, 2025 am 12:10 AM

Redis is an open source memory data structure storage used as a database, cache and message broker, suitable for scenarios where fast response and high concurrency are required. 1.Redis uses memory to store data and provides microsecond read and write speed. 2. It supports a variety of data structures, such as strings, lists, collections, etc. 3. Redis realizes data persistence through RDB and AOF mechanisms. 4. Use single-threaded model and multiplexing technology to handle requests efficiently. 5. Performance optimization strategies include LRU algorithm and cluster mode.

Redis: Caching, Session Management, and MoreRedis: Caching, Session Management, and MoreMay 01, 2025 am 12:03 AM

Redis's functions mainly include cache, session management and other functions: 1) The cache function stores data through memory to improve reading speed, and is suitable for high-frequency access scenarios such as e-commerce websites; 2) The session management function shares session data in a distributed system and automatically cleans it through an expiration time mechanism; 3) Other functions such as publish-subscribe mode, distributed locks and counters, suitable for real-time message push and multi-threaded systems and other scenarios.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)