This article mainly introduces you to the method of using PHP to implement the TopK-algorithm using a binary heap. The introduction in the article is very detailed and has certain reference and learning value for everyone. Friends who need it can follow the editor to learn together. Bar.
Preface
In the past, I often encountered a problem when working or interviewing. How to achieve massive TopN is to achieve a very large result. To quickly find the largest 10 or 100 numbers in a set, while ensuring memory and speed efficiency, our first idea may be to use sorting, and then intercept the top 10 or 100, and sorting is not suitable when the amount is not particularly large. There is no problem, but as long as the amount is extremely large, it is impossible to complete this task. For example, if there are hundreds of millions of numbers in an array or text file, it is impossible to read them all into the memory, so using sorting to solve this problem is not The best, so here we use PHP to implement a small top heap to solve this problem. A heap is a special kind of heap. A binary heap is a complete binary tree or an approximately complete binary tree. There are two types of binary heaps, the maximum heap and the minimum heap. The maximum heap: the key value of the parent node is always greater than or equal to any child. The key value of the node; the minimum heap: the key value of the parent node is always less than or equal to the key value of any child node
小头 heap-(picture from the Internet)
Binary heaps are generally represented by arrays (see the figure above). For example, the position of the root node in the array is 0, and the child nodes at the nth position are respectively 2n 1 and 2n 2. Therefore, the The child nodes of position 0 are at 1 and 2, the child nodes of 1 are at 3 and 4, and so on. This storage method makes it easy to find parent nodes and child nodes. I won’t go into details about the specific conceptual issues here. If you have any questions about the binary heap, you can have a good understanding of this data structure. Next, we will use PHP code to implement and solve the above topN problems. In order to see To find out the difference, let’s first use the sorting method to implement it and see how it works.//为了测试运行内存调大一点 ini_set('memory_limit', '2024M'); //实现一个快速排序函数 function quick_sort(array $array){ $length = count($array); $left_array = array(); $right_array = array(); if($length <= 1){ return $array; } $key = $array[0]; for($i=1;$i<$length;$i++){ if($array[$i] > $key){ $right_array[] = $array[$i]; }else{ $left_array[] = $array[$i]; } } $left_array = quick_sort($left_array); $right_array = quick_sort($right_array); return array_merge($right_array,array($key),$left_array); } //构造500w不重复数 for($i=0;$i<5000000;$i++){ $numArr[] = $i; } //打乱它们 shuffle($numArr); //现在我们从里面找到top10最大的数 var_dump(time()); print_r(array_slice(quick_sort($all),0,10)); var_dump(time());
Result after running
You can see that the top10 results are printed above, and the running time is output, which is about 99s, but this is only a 5 million number and If everything can be loaded into the memory, if we have a file with 5kw or 500 million numbers, there will definitely be some problems.
The implementation process is: 1. First read 10 or 100 numbers into the array, this is Our topN number.
2. Call the generate small top heap function to generate a small top heap structure from this array. At this time, the top of the heap must be the smallest.
3. Traverse all the remaining numbers from the file or array in sequence.
4. Each time one is traversed, compare the size with the element on the top of the heap. If it is smaller than the element on the top of the heap, discard it. If it is larger than the element on the top of the heap, it will be discarded. Replace the top element.
5. After replacing the top element of the heap, call the generate small top heap function to continue generating the small top heap, because you need to find the smallest one.
6. Repeat the above steps 4~5, so that after all traversals are completed, the largest topN will be in our small top heap, because our small top heap will always exclude the smallest and leave the largest , and the speed of adjusting the small top heap is also very fast. It is just a relative adjustment, as long as the root node is smaller than the left and right nodes.
7. In terms of algorithm complexity, according to top10 in the worst case, That is, every time a number is traversed, if it is replaced with the top of the heap, it needs to be adjusted 10 times, which is faster than the sorting speed, and it does not read all the contents into the memory. It can be understood that the achievement is a linear traversal.
//生成小顶堆函数 function Heap(&$arr,$idx){ $left = ($idx << 1) + 1; $right = ($idx << 1) + 2; if (!$arr[$left]){ return; } if($arr[$right] && $arr[$right] < $arr[$left]){ $l = $right; }else{ $l = $left; } if ($arr[$idx] > $arr[$l]){ $tmp = $arr[$idx]; $arr[$idx] = $arr[$l]; $arr[$l] = $tmp; Heap($arr,$l); } } //这里为了保证跟上面一致,也构造500w不重复数 /* 当然这个数据集并不一定全放在内存,也可以在 文件里面,因为我们并不是全部加载到内存去进 行排序 */ for($i=0;$i<5000000;$i++){ $numArr[] = $i; } //打乱它们 shuffle($numArr); //先取出10个到数组 $topArr = array_slice($numArr,0,10); //获取最后一个有子节点的索引位置 //因为在构造小顶堆的时候是从最后一个有左或右节点的位置 //开始从下往上不断的进行移动构造(具体可看上面的图去理解) $idx = floor(count($topArr) / 2) - 1; //生成小顶堆 for($i=$idx;$i>=0;$i--){ Heap($topArr,$i); } var_dump(time()); //这里可以看到,就是开始遍历剩下的所有元素 for($i = count($topArr); $i < count($numArr); $i++){ //每遍历一个则跟堆顶元素进行比较大小 if ($numArr[$i] > $topArr[0]){ //如果大于堆顶元素则替换 $topArr[0] = $numArr[$i]; /* 重新调用生成小顶堆函数进行维护,只不过这次是从堆顶 的索引位置开始自上往下进行维护,因为我们只是把堆顶 的元素给替换掉了而其余的还是按照根节点小于左右节点 的顺序摆放这也就是我们上面说的,只是相对调整下,并 不是全部调整一遍 */ Heap($topArr,0); } } var_dump(time());
You can see that the final result is also top10, only However, it only took about 1 second, and both memory and time efficiency met our requirements. And the best thing about sorting is that we don’t need to read all the data sets into the memory, because we don’t need to sort. The above is for demonstration, so 5 million elements are constructed directly in the memory. However, we can transfer all of them to a file, and then read them line by line for comparison, because the core point of our data structure is linear traversal and there are many elements in the memory. Compare the small top heap structures and finally get TopN.
The above is the entire content of this article, I hope it will be helpful to everyone's study.
Related recommendations:
php Implement hexadecimal conversion_php skills
How to compile redis and phpredis under Linux_php skills
##PHPHow to use the Mysqli class library to achieve perfect paging effect_php skills
The above is the detailed content of PHP uses binary heap to implement TopK-algorithm. For more information, please follow other related articles on the PHP Chinese website!

The main advantages of using database storage sessions include persistence, scalability, and security. 1. Persistence: Even if the server restarts, the session data can remain unchanged. 2. Scalability: Applicable to distributed systems, ensuring that session data is synchronized between multiple servers. 3. Security: The database provides encrypted storage to protect sensitive information.

Implementing custom session processing in PHP can be done by implementing the SessionHandlerInterface interface. The specific steps include: 1) Creating a class that implements SessionHandlerInterface, such as CustomSessionHandler; 2) Rewriting methods in the interface (such as open, close, read, write, destroy, gc) to define the life cycle and storage method of session data; 3) Register a custom session processor in a PHP script and start the session. This allows data to be stored in media such as MySQL and Redis to improve performance, security and scalability.

SessionID is a mechanism used in web applications to track user session status. 1. It is a randomly generated string used to maintain user's identity information during multiple interactions between the user and the server. 2. The server generates and sends it to the client through cookies or URL parameters to help identify and associate these requests in multiple requests of the user. 3. Generation usually uses random algorithms to ensure uniqueness and unpredictability. 4. In actual development, in-memory databases such as Redis can be used to store session data to improve performance and security.

Managing sessions in stateless environments such as APIs can be achieved by using JWT or cookies. 1. JWT is suitable for statelessness and scalability, but it is large in size when it comes to big data. 2.Cookies are more traditional and easy to implement, but they need to be configured with caution to ensure security.

To protect the application from session-related XSS attacks, the following measures are required: 1. Set the HttpOnly and Secure flags to protect the session cookies. 2. Export codes for all user inputs. 3. Implement content security policy (CSP) to limit script sources. Through these policies, session-related XSS attacks can be effectively protected and user data can be ensured.

Methods to optimize PHP session performance include: 1. Delay session start, 2. Use database to store sessions, 3. Compress session data, 4. Manage session life cycle, and 5. Implement session sharing. These strategies can significantly improve the efficiency of applications in high concurrency environments.

Thesession.gc_maxlifetimesettinginPHPdeterminesthelifespanofsessiondata,setinseconds.1)It'sconfiguredinphp.iniorviaini_set().2)Abalanceisneededtoavoidperformanceissuesandunexpectedlogouts.3)PHP'sgarbagecollectionisprobabilistic,influencedbygc_probabi

In PHP, you can use the session_name() function to configure the session name. The specific steps are as follows: 1. Use the session_name() function to set the session name, such as session_name("my_session"). 2. After setting the session name, call session_start() to start the session. Configuring session names can avoid session data conflicts between multiple applications and enhance security, but pay attention to the uniqueness, security, length and setting timing of session names.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Atom editor mac version download
The most popular open source editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6
Visual web development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!
