Bloom filter is a magical data structure. This article will give you an in-depth understanding of Bloom filter and introduce the method of using Bloom filter in Redis.
What is "Bloom filter"
The Bloom filter is a magical data structure,can Used to determine whether an element is in a collection. A very commonly used function is to remove duplicates. A common requirement among crawlers: There are thousands of target website URLs. How to determine whether a crawler has favored a certain URL? To put it simply, every time the crawler collects a URL, it can store the URL in the database. Every time a new URL comes over, it will go to the database to query whether it has been accessed before. [Related recommendations: Redis Video Tutorial]
select id from table where url = 'https://jaychen.cc'
But as the crawler crawls more and more URLs, the database must be accessed once before each request, and for this kind of string SQL query efficiency is not high. In addition to the database, using the set structure of Redis can also meet this requirement, and its performance is better than that of the database. But Redis also has a problem: it consumes too much memory. At this time, the Bloom filter appears very horizontally: let me answer this question.
Compared with databases and Redis, using Bloom filters can effectively avoid performance and memory usage problems.
The Bloom filter is essentially a bit array. A bit array means that each element of the array only occupies 1 bit. Each element can only be 0 or 1. In this way, applying for a bit array of 10000 elements only takes up 10000 / 8 = 1250 B of space. In addition to a bit array, the Bloom filter also has K hash functions. When an element is added to the Bloom filter, the following operations will be performed:
- Use K hash functions to calculate the element value K times to obtain K hash values.
- According to the obtained hash value, set the corresponding subscript value to 1 in the bit array.
For example, assume that the Bloom filter has 3 hash functions: f1, f2, f3 and a bit array arr
. Now we need to insert https://jaychen.cc
into the Bloom filter:
- Perform three hash calculations on the value to get three values n1, n2, n3.
- Set the three elements arr[n1], arr[n2], arr[3] in the bit array to 1.
When you want to determine whether a value is in the Bloom filter, perform a hash calculation on the element again. After getting the value, determine whether each element in the bit array is 1. If the values are all 1, then it means that this value is in the Bloom filter. If there is a value that is not 1, it means that the element is not in the Bloom filter.
If you can’t understand the text, please look at the explanation of the soul painter’s picture below
After reading the above explanation, you will definitely come up with a Problem: When more elements are inserted, the more positions in the bit array are set to 1. When an element is not in the Bloom filter, after hash calculation, the value obtained is queried in the bit array, and there is Perhaps these locations are also set to 1. Such an object that does not exist in the Bloom filter may also be misjudged as being in the Bloom filter. But if the Bloom filter determines that an element is not in the Bloom filter, then this value must not be in the Bloom filter. To put it simply:
- If the Bloom filter says that a certain element is present, it may be misjudged.
- The Bloom filter says that an element is not there, then it must not be there.
The defect of this Bloom filter is put into the requirements of the crawler above. There may be some unvisited URLs that may be misjudged as visited, but if they are visited URLs, they must be It will not be mistakenly judged as not visited.
Bloom filters in Redis
redis added the module function in version 4.0. Bloom filters can be added to redis in the form of modules. Therefore, if you use redis 4.0 or above, you can use the bloom filter in redis by loading module. But this is not the simplest way. You can use docker to experience bloom filters directly in redis.
> docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom > docker exec -it bloomfilter redis-cli
redis Bloom filter mainly has two commands:
-
bf.add
Add elements to the Bloom filter:bf. add urls https://jaychen.cc
. -
bf.exists
Determine whether an element is in the filter:bf.exists urls https://jaychen.cc
.
As mentioned above, there are misjudgments in the Bloom filter. There are two values in redis that determine the accuracy of the Bloom filter:
-
error_rate
:允许布隆过滤器的错误率,这个值越低过滤器的位数组的大小越大,占用空间也就越大。 -
initial_size
:布隆过滤器可以储存的元素个数,当实际存储的元素个数超过这个值之后,过滤器的准确率会下降。
redis 中有一个命令可以来设置这两个值:
bf.reserve urls 0.01 100
三个参数的含义:
- 第一个值是过滤器的名字。
- 第二个值为
error_rate
的值。 - 第三个值为
initial_size
的值。
使用这个命令要注意一点:执行这个命令之前过滤器的名字应该不存在,如果执行之前就存在会报错:(error) ERR item exists
更多编程相关知识,请访问:编程入门!!
The above is the detailed content of What is a bloom filter? How to use it in Redis?. For more information, please follow other related articles on the PHP Chinese website!

Redis是现在最热门的key-value数据库,Redis的最大特点是key-value存储所带来的简单和高性能;相较于MongoDB和Redis,晚一年发布的ES可能知名度要低一些,ES的特点是搜索,ES是围绕搜索设计的。

本篇文章给大家带来了关于redis的相关知识,其中主要介绍了关于redis的一些优势和特点,Redis 是一个开源的使用ANSI C语言编写、遵守 BSD 协议、支持网络、可基于内存、分布式存储数据库,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于redis的相关知识,其中主要介绍了Redis Cluster集群收缩主从节点的相关问题,包括了Cluster集群收缩概念、将6390主节点从集群中收缩、验证数据迁移过程是否导致数据异常等,希望对大家有帮助。

本篇文章给大家带来了关于redis的相关知识,其中主要介绍了关于原子操作中命令原子性的相关问题,包括了处理并发的方案、编程模型、多IO线程以及单命令的相关内容,下面一起看一下,希望对大家有帮助。

本篇文章给大家带来了关于redis的相关知识,其中主要介绍了bitmap问题,Redis 为我们提供了位图这一数据结构,位图数据结构其实并不是一个全新的玩意,我们可以简单的认为就是个数组,只是里面的内容只能为0或1而已,希望对大家有帮助。

本篇文章给大家带来了关于redis的相关知识,其中主要介绍了关于实现秒杀的相关内容,包括了秒杀逻辑、存在的链接超时、超卖和库存遗留的问题,下面一起来看一下,希望对大家有帮助。

redis error就是redis数据库和其组合使用的部件出现错误,这个出现的错误有很多种,例如Redis被配置为保存数据库快照,但它不能持久化到硬盘,用来修改集合数据的命令不能用。

本篇文章主要介绍了Redis sentinel哨兵集群的实现步骤,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version
SublimeText3 Linux latest version

Notepad++7.3.1
Easy-to-use and free code editor

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6
Visual web development tools
