How to implement Redis BloomFilter Bloom filter-Redis-php.cn

Home

Database

Redis

How to implement Redis BloomFilter Bloom filter

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 30, 2023 pm 01:41 PM

redisbloomfilter

Bloom Filter Concept

A man named Bloom proposed the Bloom filter (English name: Bloom Filter) in 1970. It's actually a long binary vector and a series of random mapping functions. Bloom filters can be used to retrieve whether an element is in a collection. Its advantage is that space efficiency and query time are far higher than those of ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and difficulty in deletion.

Bloom Filter Principle

The principle of the Bloom filter is that when an element is added to the set, the element is mapped into K points in a bit array through K hash functions. , set them to 1. When retrieving, we only need to look at whether these points are all 1 to (approximately) know whether it is in the set: if any of these points has 0, then the checked element must not be there; if they are all 1, then the checked element Most likely. This is the basic idea of Bloom filter.

The difference between Bloom Filter and single hash function Bit-Map is that Bloom Filter uses k hash functions, and each string corresponds to k bits. Thereby reducing the probability of conflict

How to implement Redis BloomFilter Bloom filter

Cache penetration

How to implement Redis BloomFilter Bloom filter

##Every query will be directly hit to the DB

In short, in a nutshell, we first load all the data from our database into our filter. For example, the id of the database now has: 1, 2, 3

Then use id: 1 is an example. After hashing three times in the above picture, he changed the three places where the original value was 0 to 1.

The next time the data comes in for query, if the value of id is 1, then I will change it to 1. Take three hashes and find that the values of the three hashes are exactly the same as the three positions above, which can prove that there is 1 in the filter

. On the contrary, if they are different, it means that it does not exist

So where are the application scenarios? Generally we will use it to prevent cache breakdown

To put it simply, the id of your database starts with 1 and then increases by itself. Then I know that your interface is queried by id, so I will use negative numbers to query. At this time, you will find that the data is not in the cache, and I go to the database to check it, but it is not found. One request is like this, what about 100, 1,000, or 10,000? Basically your DB can't handle it. If you add this to the cache, it will no longer exist. If you judge that there is no such data, you won't check it. Wouldn't it be better to just return the data as empty?

If this thing is so good, what are the drawbacks? Yes, let’s go on to see

Disadvantages of Bloom Filter

The reason why bloom filter can be more efficient in time and space is because it sacrifices the accuracy of judgment. Convenience of deletion

Although the container may not contain the elements that should be searched, due to the hash operation, the values of these elements in k hash positions are all 1, so it may lead to misjudgment. By establishing a whitelist to store elements that may be misjudged, when the Bloom Filter stores a blacklist, the misjudgement rate can be reduced.

Deletion is difficult. An element placed in the container is mapped to 1 in the k positions of the bit array. When deleting, it cannot be simply set to 0 directly, as it may affect the judgment of other elements. You can use Counting Bloom Filter

FAQ

1. Why use multiple hash functions?

If only one hash function is used, the Hash itself will often conflict. For example, for an array with a length of 100, if only one hash function is used, after adding one element, the probability of conflict when adding the second element is 1%, and the probability of conflict when adding the third element is 2%... But if two elements are used, the probability of collision is 1%. A hash function, after adding an element, the probability of conflict when adding the second element is reduced to 4 out of 10,000 (four possible conflict situations, total number of situations 100x100)

go language implementation

package main
import (
	"fmt"
	"github.com/bits-and-blooms/bitset"
)
//设置哈希数组默认大小为16
const DefaultSize = 16
//设置种子，保证不同哈希函数有不同的计算方式
var seeds = []uint{7, 11, 13, 31, 37, 61}
//布隆过滤器结构，包括二进制数组和多个哈希函数
type BloomFilter struct {
	//使用第三方库
	set *bitset.BitSet
	//指定长度为6
	hashFuncs [6]func(seed uint, value string) uint
}
//构造一个布隆过滤器，包括数组和哈希函数的初始化
func NewBloomFilter() *BloomFilter {
	bf := new(BloomFilter)
	bf.set = bitset.New(DefaultSize)

	for i := 0; i < len(bf.hashFuncs); i++ {
		bf.hashFuncs[i] = createHash()
	}
	return bf
}
//构造6个哈希函数，每个哈希函数有参数seed保证计算方式的不同
func createHash() func(seed uint, value string) uint {
	return func(seed uint, value string) uint {
		var result uint = 0
		for i := 0; i < len(value); i++ {
			result = result*seed + uint(value[i])
		}
		//length = 2^n 时，X % length = X & (length - 1)
		return result & (DefaultSize - 1)
	}
}
//添加元素
func (b *BloomFilter) add(value string) {
	for i, f := range b.hashFuncs {
		//将哈希函数计算结果对应的数组位置1
		b.set.Set(f(seeds[i], value))
	}
}
//判断元素是否存在
func (b *BloomFilter) contains(value string) bool {
	//调用每个哈希函数，并且判断数组对应位是否为1
	//如果不为1，直接返回false，表明一定不存在
	for i, f := range b.hashFuncs {
		//result = result && b.set.Test(f(seeds[i], value))
		if !b.set.Test(f(seeds[i], value)) {
			return false
		}
	}
	return true
}
func main() {
	filter := NewBloomFilter()
	filter.add("asd")
	fmt.Println(filter.contains("asd"))
	fmt.Println(filter.contains("2222"))
	fmt.Println(filter.contains("155343"))
}

The output results are as follows:

true
false
false

The above is the detailed content of How to implement Redis BloomFilter Bloom filter. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

When Should I Use Redis Instead of a Traditional Database?May 13, 2025 pm 04:01 PM

UseRedisinsteadofatraditionaldatabasewhenyourapplicationrequiresspeedandreal-timedataprocessing,suchasforcaching,sessionmanagement,orreal-timeanalytics.Redisexcelsin:1)Caching,reducingloadonprimarydatabases;2)Sessionmanagement,simplifyingdatahandling

Redis: Beyond SQL - The NoSQL PerspectiveMay 08, 2025 am 12:25 AM

Redis goes beyond SQL databases because of its high performance and flexibility. 1) Redis achieves extremely fast read and write speed through memory storage. 2) It supports a variety of data structures, such as lists and collections, suitable for complex data processing. 3) Single-threaded model simplifies development, but high concurrency may become a bottleneck.

Redis: A Comparison to Traditional Database ServersMay 07, 2025 am 12:09 AM

Redis is superior to traditional databases in high concurrency and low latency scenarios, but is not suitable for complex queries and transaction processing. 1.Redis uses memory storage, fast read and write speed, suitable for high concurrency and low latency requirements. 2. Traditional databases are based on disk, support complex queries and transaction processing, and have strong data consistency and persistence. 3. Redis is suitable as a supplement or substitute for traditional databases, but it needs to be selected according to specific business needs.

Redis: Introduction to a Powerful In-Memory Data StoreMay 06, 2025 am 12:08 AM

Redisisahigh-performancein-memorydatastructurestorethatexcelsinspeedandversatility.1)Itsupportsvariousdatastructureslikestrings,lists,andsets.2)Redisisanin-memorydatabasewithpersistenceoptions,ensuringfastperformanceanddatasafety.3)Itoffersatomicoper

Is Redis Primarily a Database?May 05, 2025 am 12:07 AM

Redis is primarily a database, but it is more than just a database. 1. As a database, Redis supports persistence and is suitable for high-performance needs. 2. As a cache, Redis improves application response speed. 3. As a message broker, Redis supports publish-subscribe mode, suitable for real-time communication.

Redis: Database, Server, or Something Else?May 04, 2025 am 12:08 AM

Redisisamultifacetedtoolthatservesasadatabase,server,andmore.Itfunctionsasanin-memorydatastructurestore,supportsvariousdatastructures,andcanbeusedasacache,messagebroker,sessionstorage,andfordistributedlocking.

Redis: Unveiling Its Purpose and Key ApplicationsMay 03, 2025 am 12:11 AM

Redisisanopen-source,in-memorydatastructurestoreusedasadatabase,cache,andmessagebroker,excellinginspeedandversatility.Itiswidelyusedforcaching,real-timeanalytics,sessionmanagement,andleaderboardsduetoitssupportforvariousdatastructuresandfastdataacces

Redis: A Guide to Key-Value Data StoresMay 02, 2025 am 12:10 AM

Redis is an open source memory data structure storage used as a database, cache and message broker, suitable for scenarios where fast response and high concurrency are required. 1.Redis uses memory to store data and provides microsecond read and write speed. 2. It supports a variety of data structures, such as strings, lists, collections, etc. 3. Redis realizes data persistence through RDB and AOF mechanisms. 4. Use single-threaded model and multiplexing technology to handle requests efficiently. 5. Performance optimization strategies include LRU algorithm and cluster mode.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

WebStorm Mac version

Useful JavaScript development tools

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.