1. What is a Bloom filter?
The Bloom Filter was proposed by a guy named Bloom in 1970.
It can actually be viewed as a data structure consisting of a binary vector (or bit array) and a series of random mapping functions (hash functions).
Its advantage is that space efficiency and query time are much better than ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and difficulty in deletion.
2. Implementation principle
Let’s take a picture first
Bloom filter algorithm The main idea is to use n hash functions to perform hashing to obtain different hash values. According to the hash, they are mapped to different index positions of the array (the length of this array may be very long), and then the corresponding index bits are The value on is set to 1.
To determine whether the element appears in the set is to use k different hash functions to calculate the hash value and see whether the value at the corresponding index position of the hash value is 1. If there is one that is not 1 , indicating that the element does not exist in the collection.
But it is also possible to judge that the element is in the set, but the element is not. The 1s above all index positions of this element are set by other elements, which leads to a certain probability of misjudgment (this is why the above is live The root cause may be in a collection, because there will be certain hash conflicts).
Note: The lower the false positive rate, the lower the corresponding performance will be.
3. Function
The Bloom filter can be used to determine whether an element is (possibly) in a set, and compared to other data structures, the Bloom filter There are huge advantages in both space and time.
Note the word above: possible. There is a suspense reserved here, which will be analyzed in detail below.
Judge whether the given data exists
Prevent cache penetration (judge whether the requested data is valid to avoid directly bypassing the cache to request the database), etc., mailbox spam filtering, blacklist function, etc. wait.
4. Specific implementation
After reading the algorithm idea of Bloom filter, let’s start to explain the specific implementation.
Let me first give an example. Suppose there are two strings, Wangcai and Xiaoqiang. They have been hashed three times respectively, and then the index of the corresponding array (assuming the array length is 16) is calculated based on the hash result. The value of the position is set to 1. Let’s first look at the phrase Wangcai:
After three hashes of Wangcai, the values are 2, 4, and 6 respectively. Then we can get The index values are 2, 4, and 6 respectively, so the values of the index (2, 4, 6) positions of the array are set to 1, and the rest are regarded as 0. Now suppose that you need to find Wangcai, and also go through these three hashes and then It is found that the values of the positions corresponding to indexes 2, 4, and 6 are all 1, then it can be judged that prosperous wealth may exist.
Then insert Xiaoqiang into the Bloom filter. The actual process is the same as above. Assume that the obtained subscripts are 1, 3, 5
Putting aside the existence of Wangcai, Xiaoqiang looks like this in the Bloom filter at this time. The actual array combining Wangcai and Xiaoqiang looks like this:
Now there is a data: 9527. The current requirement is to determine whether 9527 exists. Assume that the subscripts obtained by hashing 9527 three times are: 5, 6, and 7. It turns out that the value of the position with subscript 7 is 0, so it can be definitely judged that 9527 must not exist.
Then came another domestic 007. After three hashes, the subscripts obtained were: 2, 3, and 5. It turned out that the values corresponding to the subscripts 2, 3, and 5 were all 1, so it can be roughly It is judged that domestic 007 may exist. But in fact, after our demonstration just now, domestic 007 does not exist at all. The reason why the values of index positions 2, 3, and 5 are 1 is because of other data settings.
Speaking of which, I wonder if everyone understands the role of the Bloom filter.
5. Code implementation
As Java programmers, we are really happy. We use many frameworks and tools, and they are basically encapsulated. Bloom filters , we will use the tool class packaged by Google. Of course there are other methods that you can explore.
First add dependencies
<!--布隆过滤依赖--> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>25.1-jre</version> </dependency>
Code implementation
import com.google.common.hash.BloomFilter; import com.google.common.hash.Funnels; import java.nio.charset.Charset; public class BloomFilterDemo { public static void main(String[] args) { /** * 创建一个插入对象为一亿,误报率为0.01%的布隆过滤器 * 不存在一定不存在 * 存在不一定存在 * ---------------- * Funnel 对象:预估的元素个数,误判率 * mightContain :方法判断元素是否存在 */ BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), 100000000, 0.0001); bloomFilter.put("死"); bloomFilter.put("磕"); bloomFilter.put("Redis"); System.out.println(bloomFilter.mightContain("Redis")); System.out.println(bloomFilter.mightContain("Java")); } }
The specific explanation has been written in the comments. By now I believe everyone must understand the Bloom filter and how to use it.
6. Practical combat
Let’s simulate this scenario: solving cache penetration through Bloom filters.
First of all, do you know what cache penetration is?
Cache penetration means that the user accesses a data that is not in the cache or the database. Because it does not exist in the cache, it will access the database if the concurrency is high. It is easy to defeat the database
So how does the Bloom filter solve this problem? he
的原理是这样子的:将数据库中所有的查询条件,放入布隆过滤器中,当一个查询请求过来时,先经过布隆过滤器进行查,如果判断请求查询值存在,则继续查;如果判断请求查询不存在,直接丢弃。
其代码如下:
String get(String key) { String value = redis.get(key); if (value == null) { if(!bloomfilter.mightContain(key)){ return null; }else{ value = db.get(key); redis.set(key, value); } } return value; }
The above is the detailed content of How to quickly determine whether an element is in a collection in java. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于结构化数据处理开源库SPL的相关问题,下面就一起来看一下java下理想的结构化数据处理类库,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于PriorityQueue优先级队列的相关知识,Java集合框架中提供了PriorityQueue和PriorityBlockingQueue两种类型的优先级队列,PriorityQueue是线程不安全的,PriorityBlockingQueue是线程安全的,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于java锁的相关问题,包括了独占锁、悲观锁、乐观锁、共享锁等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于多线程的相关问题,包括了线程安装、线程加锁与线程不安全的原因、线程安全的标准类等等内容,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于枚举的相关问题,包括了枚举的基本操作、集合类对枚举的支持等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Java的相关知识,其中主要介绍了关于关键字中this和super的相关问题,以及他们的一些区别,下面一起来看一下,希望对大家有帮助。

封装是一种信息隐藏技术,是指一种将抽象性函式接口的实现细节部分包装、隐藏起来的方法;封装可以被认为是一个保护屏障,防止指定类的代码和数据被外部类定义的代码随机访问。封装可以通过关键字private,protected和public实现。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于平衡二叉树(AVL树)的相关知识,AVL树本质上是带了平衡功能的二叉查找树,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Chinese version
Chinese version, very easy to use
