Home  >  Article  >  Backend Development  >  How to query non-duplicate data structures in php

How to query non-duplicate data structures in php

PHPz
PHPzOriginal
2023-04-04 10:44:39644browse

With the continuous development of the Internet, the use of databases in websites and applications has become more and more common. When developing and maintaining applications, querying data is a very critical task. How to query and process data efficiently has become an important issue faced by developers. This article will introduce a method of querying non-duplicate data structures with PHP to solve this problem.

  1. Case Introduction

Suppose there is an advertising system. Each advertisement has a unique ID number and can be displayed on different pages. If you want to display the advertisement on a certain page, you can query the advertisement data in the MySQL database and filter the results according to the following three conditions:

1) Display status: only display is "displaying" Advertisements with status (status=1).

2) Display probability: Each advertisement has a display probability (show_ratio), and it is decided whether to display the advertisement based on the probability.

3) Eliminate duplicate display: Do not display duplicate ads on the same page.

How to efficiently query advertising data that meets the conditions? This requires an efficient non-duplicate data structure to complete.

  1. Introduction to non-duplicate data structures

In order to satisfy the above query conditions, this article introduces a non-duplicate data structure based on Redis - HyperLogLog (HLL for short), HLL can Efficiently estimate the cardinality, i.e. the number of distinct elements, of a data set. Using HLL, you can quickly count the number of advertisements with display status "In Display" and display probability that meet the requirements, and remove duplicate displays.

HLL estimates the cardinality of a data set by using a set of hash functions. Its implementation principle is similar to the Bloom filter, but its error rate is lower. In Redis, the HLL type provides the pfadd command to add elements and the pfcount command to calculate the cardinality. The following is an example of PHP code:

$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
$redis->pfadd('ad', 'ad1', 'ad2', 'ad3'); // 添加广告 ID
$redis->pfadd('ad', 'ad3', 'ad4', 'ad5'); // 添加广告 ID
$count = $redis->pfcount('ad'); // 获取基数

The above code uses HLL in Redis to store the advertising ID, and determines whether an advertisement has been displayed by adding the advertising ID and calculating the base.

  1. Case implementation

In this case, first query all advertisements whose display status is "displaying", then calculate the number of advertisements that meet the display probability requirements, and finally based on HLL prevents duplicate presentations. The following is a PHP query code:

$redis = new Redis();
$redis->connect('127.0.0.1', 6379);

// 查询展示状态为“展示中”的所有广告信息
$sql = "SELECT * FROM ad WHERE status=1";
$result = $mysqli->query($sql);

$total = 0;
while ($row = $result->fetch_assoc()) {
    $show_ratio = $row['show_ratio']; // 广告展示概率
    $ad_id = $row['ad_id']; // 广告 ID

    // 判断是否需要展示该广告
    $rand_num = mt_rand(1, 10000);
    if ($rand_num <= $show_ratio * 10000) {
        $redis->pfadd('ad', $ad_id); // 添加广告 ID
        $total++;
    }
}

$count = $redis->pfcount('ad'); // 获取基数
if ($total != $count) {
    // 如果总数量不等于 HLL 的基数,则有重复广告
    // 再次处理广告展示逻辑
}

The above code uses a while loop to calculate and add probability to each advertisement. The code for removing duplicate displays based on HLL is located outside the while loop. It determines whether there are duplicate ads by judging whether the number of elements added to the HLL is equal to the number of ads that have been calculated.

  1. Summary

This article introduces a method of using Redis to implement HLL data structure to achieve efficient query of non-duplicate data. In actual projects, it can be improved and expanded according to specific needs. For example, you can add an expiration time to regularly clear out expired elements, or add a layer of bloom filters to the HLL to improve the accuracy of deduplication, etc. It is believed that these methods can solve the deduplication problems often encountered when querying data and improve the efficiency and performance of applications.

The above is the detailed content of How to query non-duplicate data structures in php. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn