search
HomeBackend DevelopmentPHP TutorialPHP data structure: clever use of Bloom filters to achieve efficient collection retrieval

The Bloom filter is a space-efficient data structure used to determine whether an element belongs to a set. It uses a hash function and a bit array to efficiently find if the element is present, possibly with false positives. It is suitable for scenarios where a large number of elements need to be retrieved quickly, such as URL duplicate detection.

PHP data structure: clever use of Bloom filters to achieve efficient collection retrieval

PHP data structure: clever use of Bloom filters to achieve efficient collection retrieval

Introduction

The Bloom filter is a highly space-efficient data structure used to determine whether an element belongs to a set. It uses a hash function and a bit array to efficiently find whether the element is present.

Principle

The Bloom filter initializes a bit array, and each position is initially 0. Then, the elements are hashed using multiple hash functions, the bit array is indexed with the hash value, and the value at that position is set to 1.

If an element belongs to the set, its hash value will find at least one position in the bit array that is 1. However, it is also possible to find a position of 1 for an element that does not belong to the set, called a false positive.

Implementation

class BloomFilter {

    // 过滤器大小 (位数)
    private $size;

    // 哈希函数个数
    private $numHashes;

    // 哈希函数
    private $hashFunctions;

    // 过滤器位数组
    private $filter;

    // 初始化布隆过滤器
    public function __construct($size, $numHashes) {
        $this->size = $size;
        $this->numHashes = $numHashes;
        $this->initHashFunctions();
        $this->filter = array_fill(0, $this->size, 0);
    }

    // 初始化哈希函数
    private function initHashFunctions() {
        $this->hashFunctions = [];
        for ($i = 0; $i < $this->numHashes; $i++) {
            $this->hashFunctions[] = function ($key) use ($i) {
                return abs(crc32($key . $i));
            };
        }
    }

    // 向过滤器中添加元素
    public function add($element) {
        foreach ($this->hashFunctions as $hashFunction) {
            $index = $hashFunction($element) % $this->size;
            $this->filter[$index] = 1;
        }
    }

    // 检查元素是否存在过滤器中
    public function isExists($element) {
        foreach ($this->hashFunctions as $hashFunction) {
            $index = $hashFunction($element) % $this->size;
            if ($this->filter[$index] == 0) {
                return false;
            }
        }
        // 找到了元素的所有哈希值对应的位,但可能是假阳性
        return true;
    }
}

Practical case: detecting URL duplication

Goal:Use Bloom filtering The server quickly detects whether a large number of URLs have been crawled.

Implementation:

  1. Initialize the Bloom filter, set the size and number of hash functions.
  2. Call the add() method for each crawled URL to add it to the filter.
  3. When encountering a new URL, use the isExists() method to quickly check whether it already exists in the filter. If it exists, the URL is skipped; otherwise, the new URL is added to the filter.

Advantages:

  • Space efficient: Bloom filter size has nothing to do with the number of elements that need to be detected.
  • Fast retrieval: By using hash functions, retrieval operations do not require traversing the entire collection.
  • Acceptable error rate: Bloom filters allow some false positives, but the size and number of hash functions can be adjusted as needed to optimize the error rate.

The above is the detailed content of PHP data structure: clever use of Bloom filters to achieve efficient collection retrieval. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
PHP Email: Step-by-Step Sending GuidePHP Email: Step-by-Step Sending GuideMay 09, 2025 am 12:14 AM

PHPisusedforsendingemailsduetoitsintegrationwithservermailservicesandexternalSMTPproviders,automatingnotificationsandmarketingcampaigns.1)SetupyourPHPenvironmentwithawebserverandPHP,ensuringthemailfunctionisenabled.2)UseabasicscriptwithPHP'smailfunct

How to Send Email via PHP: Examples & CodeHow to Send Email via PHP: Examples & CodeMay 09, 2025 am 12:13 AM

The best way to send emails is to use the PHPMailer library. 1) Using the mail() function is simple but unreliable, which may cause emails to enter spam or cannot be delivered. 2) PHPMailer provides better control and reliability, and supports HTML mail, attachments and SMTP authentication. 3) Make sure SMTP settings are configured correctly and encryption (such as STARTTLS or SSL/TLS) is used to enhance security. 4) For large amounts of emails, consider using a mail queue system to optimize performance.

Advanced PHP Email: Custom Headers & FeaturesAdvanced PHP Email: Custom Headers & FeaturesMay 09, 2025 am 12:13 AM

CustomheadersandadvancedfeaturesinPHPemailenhancefunctionalityandreliability.1)Customheadersaddmetadatafortrackingandcategorization.2)HTMLemailsallowformattingandinteractivity.3)AttachmentscanbesentusinglibrarieslikePHPMailer.4)SMTPauthenticationimpr

Guide to Sending Emails with PHP & SMTPGuide to Sending Emails with PHP & SMTPMay 09, 2025 am 12:06 AM

Sending mail using PHP and SMTP can be achieved through the PHPMailer library. 1) Install and configure PHPMailer, 2) Set SMTP server details, 3) Define the email content, 4) Send emails and handle errors. Use this method to ensure the reliability and security of emails.

What is the best way to send an email using PHP?What is the best way to send an email using PHP?May 08, 2025 am 12:21 AM

ThebestapproachforsendingemailsinPHPisusingthePHPMailerlibraryduetoitsreliability,featurerichness,andeaseofuse.PHPMailersupportsSMTP,providesdetailederrorhandling,allowssendingHTMLandplaintextemails,supportsattachments,andenhancessecurity.Foroptimalu

Best Practices for Dependency Injection in PHPBest Practices for Dependency Injection in PHPMay 08, 2025 am 12:21 AM

The reason for using Dependency Injection (DI) is that it promotes loose coupling, testability, and maintainability of the code. 1) Use constructor to inject dependencies, 2) Avoid using service locators, 3) Use dependency injection containers to manage dependencies, 4) Improve testability through injecting dependencies, 5) Avoid over-injection dependencies, 6) Consider the impact of DI on performance.

PHP performance tuning tips and tricksPHP performance tuning tips and tricksMay 08, 2025 am 12:20 AM

PHPperformancetuningiscrucialbecauseitenhancesspeedandefficiency,whicharevitalforwebapplications.1)CachingwithAPCureducesdatabaseloadandimprovesresponsetimes.2)Optimizingdatabasequeriesbyselectingnecessarycolumnsandusingindexingspeedsupdataretrieval.

PHP Email Security: Best Practices for Sending EmailsPHP Email Security: Best Practices for Sending EmailsMay 08, 2025 am 12:16 AM

ThebestpracticesforsendingemailssecurelyinPHPinclude:1)UsingsecureconfigurationswithSMTPandSTARTTLSencryption,2)Validatingandsanitizinginputstopreventinjectionattacks,3)EncryptingsensitivedatawithinemailsusingOpenSSL,4)Properlyhandlingemailheaderstoa

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version