search
HomeJavajavaTutorialHow to quickly determine whether an element is in a collection in java

1. What is a Bloom filter?

The Bloom Filter was proposed by a guy named Bloom in 1970.

It can actually be viewed as a data structure consisting of a binary vector (or bit array) and a series of random mapping functions (hash functions).

Its advantage is that space efficiency and query time are much better than ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and difficulty in deletion.

How to quickly determine whether an element is in a collection in java

2. Implementation principle

Let’s take a picture first

How to quickly determine whether an element is in a collection in java

Bloom filter algorithm The main idea is to use n hash functions to perform hashing to obtain different hash values. According to the hash, they are mapped to different index positions of the array (the length of this array may be very long), and then the corresponding index bits are The value on is set to 1.

To determine whether the element appears in the set is to use k different hash functions to calculate the hash value and see whether the value at the corresponding index position of the hash value is 1. If there is one that is not 1 , indicating that the element does not exist in the collection.

But it is also possible to judge that the element is in the set, but the element is not. The 1s above all index positions of this element are set by other elements, which leads to a certain probability of misjudgment (this is why the above is live The root cause may be in a collection, because there will be certain hash conflicts).

Note: The lower the false positive rate, the lower the corresponding performance will be.

3. Function

The Bloom filter can be used to determine whether an element is (possibly) in a set, and compared to other data structures, the Bloom filter There are huge advantages in both space and time.

Note the word above: possible. There is a suspense reserved here, which will be analyzed in detail below.

Judge whether the given data exists

Prevent cache penetration (judge whether the requested data is valid to avoid directly bypassing the cache to request the database), etc., mailbox spam filtering, blacklist function, etc. wait.

4. Specific implementation

After reading the algorithm idea of ​​Bloom filter, let’s start to explain the specific implementation.

Let me first give an example. Suppose there are two strings, Wangcai and Xiaoqiang. They have been hashed three times respectively, and then the index of the corresponding array (assuming the array length is 16) is calculated based on the hash result. The value of the position is set to 1. Let’s first look at the phrase Wangcai:

How to quickly determine whether an element is in a collection in java

After three hashes of Wangcai, the values ​​​​are 2, 4, and 6 respectively. Then we can get The index values ​​are 2, 4, and 6 respectively, so the values ​​of the index (2, 4, 6) positions of the array are set to 1, and the rest are regarded as 0. Now suppose that you need to find Wangcai, and also go through these three hashes and then It is found that the values ​​of the positions corresponding to indexes 2, 4, and 6 are all 1, then it can be judged that prosperous wealth may exist.

Then insert Xiaoqiang into the Bloom filter. The actual process is the same as above. Assume that the obtained subscripts are 1, 3, 5

How to quickly determine whether an element is in a collection in java

Putting aside the existence of Wangcai, Xiaoqiang looks like this in the Bloom filter at this time. The actual array combining Wangcai and Xiaoqiang looks like this:

How to quickly determine whether an element is in a collection in java

Now there is a data: 9527. The current requirement is to determine whether 9527 exists. Assume that the subscripts obtained by hashing 9527 three times are: 5, 6, and 7. It turns out that the value of the position with subscript 7 is 0, so it can be definitely judged that 9527 must not exist.

Then came another domestic 007. After three hashes, the subscripts obtained were: 2, 3, and 5. It turned out that the values ​​corresponding to the subscripts 2, 3, and 5 were all 1, so it can be roughly It is judged that domestic 007 may exist. But in fact, after our demonstration just now, domestic 007 does not exist at all. The reason why the values ​​of index positions 2, 3, and 5 are 1 is because of other data settings.

Speaking of which, I wonder if everyone understands the role of the Bloom filter.

5. Code implementation

As Java programmers, we are really happy. We use many frameworks and tools, and they are basically encapsulated. Bloom filters , we will use the tool class packaged by Google. Of course there are other methods that you can explore.

First add dependencies

<!--布隆过滤依赖-->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>25.1-jre</version>
</dependency>

Code implementation

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import java.nio.charset.Charset;
public class BloomFilterDemo {
        public static void main(String[] args) {
        /**
         * 创建一个插入对象为一亿,误报率为0.01%的布隆过滤器
         * 不存在一定不存在
         * 存在不一定存在
         * ----------------
         *  Funnel 对象:预估的元素个数,误判率
         *  mightContain :方法判断元素是否存在
         */
        BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), 100000000, 0.0001);
        bloomFilter.put("死");
        bloomFilter.put("磕");
        bloomFilter.put("Redis");
        System.out.println(bloomFilter.mightContain("Redis"));
        System.out.println(bloomFilter.mightContain("Java"));
    }
}

The specific explanation has been written in the comments. By now I believe everyone must understand the Bloom filter and how to use it.

6. Practical combat

Let’s simulate this scenario: solving cache penetration through Bloom filters.

First of all, do you know what cache penetration is?

Cache penetration means that the user accesses a data that is not in the cache or the database. Because it does not exist in the cache, it will access the database if the concurrency is high. It is easy to defeat the database

So how does the Bloom filter solve this problem? he

的原理是这样子的:将数据库中所有的查询条件,放入布隆过滤器中,当一个查询请求过来时,先经过布隆过滤器进行查,如果判断请求查询值存在,则继续查;如果判断请求查询不存在,直接丢弃。

其代码如下:

String get(String key) {
    String value = redis.get(key);     
    if (value  == null) {
        if(!bloomfilter.mightContain(key)){
            return null; 
        }else{
            value = db.get(key); 
            redis.set(key, value); 
        }    
    }
    return value;
}

The above is the detailed content of How to quickly determine whether an element is in a collection in java. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
JVM performance vs other languagesJVM performance vs other languagesMay 14, 2025 am 12:16 AM

JVM'sperformanceiscompetitivewithotherruntimes,offeringabalanceofspeed,safety,andproductivity.1)JVMusesJITcompilationfordynamicoptimizations.2)C offersnativeperformancebutlacksJVM'ssafetyfeatures.3)Pythonisslowerbuteasiertouse.4)JavaScript'sJITisles

Java Platform Independence: Examples of useJava Platform Independence: Examples of useMay 14, 2025 am 12:14 AM

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunonanyplatformwithaJVM.1)Codeiscompiledintobytecode,notmachine-specificcode.2)BytecodeisinterpretedbytheJVM,enablingcross-platformexecution.3)Developersshouldtestacross

JVM Architecture: A Deep Dive into the Java Virtual MachineJVM Architecture: A Deep Dive into the Java Virtual MachineMay 14, 2025 am 12:12 AM

TheJVMisanabstractcomputingmachinecrucialforrunningJavaprogramsduetoitsplatform-independentarchitecture.Itincludes:1)ClassLoaderforloadingclasses,2)RuntimeDataAreafordatastorage,3)ExecutionEnginewithInterpreter,JITCompiler,andGarbageCollectorforbytec

JVM: Is JVM related to the OS?JVM: Is JVM related to the OS?May 14, 2025 am 12:11 AM

JVMhasacloserelationshipwiththeOSasittranslatesJavabytecodeintomachine-specificinstructions,managesmemory,andhandlesgarbagecollection.ThisrelationshipallowsJavatorunonvariousOSenvironments,butitalsopresentschallengeslikedifferentJVMbehaviorsandOS-spe

Java: Write Once, Run Anywhere (WORA) - A Deep Dive into Platform IndependenceJava: Write Once, Run Anywhere (WORA) - A Deep Dive into Platform IndependenceMay 14, 2025 am 12:05 AM

Java implementation "write once, run everywhere" is compiled into bytecode and run on a Java virtual machine (JVM). 1) Write Java code and compile it into bytecode. 2) Bytecode runs on any platform with JVM installed. 3) Use Java native interface (JNI) to handle platform-specific functions. Despite challenges such as JVM consistency and the use of platform-specific libraries, WORA greatly improves development efficiency and deployment flexibility.

Java Platform Independence: Compatibility with different OSJava Platform Independence: Compatibility with different OSMay 13, 2025 am 12:11 AM

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunondifferentoperatingsystemswithoutmodification.TheJVMcompilesJavacodeintoplatform-independentbytecode,whichittheninterpretsandexecutesonthespecificOS,abstractingawayOS

What features make java still powerfulWhat features make java still powerfulMay 13, 2025 am 12:05 AM

Javaispowerfulduetoitsplatformindependence,object-orientednature,richstandardlibrary,performancecapabilities,andstrongsecurityfeatures.1)PlatformindependenceallowsapplicationstorunonanydevicesupportingJava.2)Object-orientedprogrammingpromotesmodulara

Top Java Features: A Comprehensive Guide for DevelopersTop Java Features: A Comprehensive Guide for DevelopersMay 13, 2025 am 12:04 AM

The top Java functions include: 1) object-oriented programming, supporting polymorphism, improving code flexibility and maintainability; 2) exception handling mechanism, improving code robustness through try-catch-finally blocks; 3) garbage collection, simplifying memory management; 4) generics, enhancing type safety; 5) ambda expressions and functional programming to make the code more concise and expressive; 6) rich standard libraries, providing optimized data structures and algorithms.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor