Home >Java >javaTutorial >How to use heap to solve Top-k problem in Java?

How to use heap to solve Top-k problem in Java?

王林
王林forward
2023-04-21 10:19:161052browse

1. What is a heap?

Heap structure

Heap is actually a kind of binary tree, but ordinary binary trees store data in a chain structure, while heaps store data sequentially in arrays. So what kind of binary tree is suitable for sequential storage?

We assume that an ordinary binary tree can be stored in an array, then we can get the following structure:

How to use heap to solve Top-k problem in Java?

We can see that when there is space in the middle of the binary tree value, the storage space of the array will be wasted, so under what circumstances will the space not be wasted? That is a complete binary tree.

How to use heap to solve Top-k problem in Java?

From the above structure, we cannot use the pointer of the chain structure to access the child node or parent node. We can only access it through the corresponding subscript, which is actually relatively simple.

For example, in the picture below:

It is known that the subscript of node 2 is 1, then

The subscript of its left child is: 2 * 2 1 = 3

The subscript of his right child is: 2 * 2 2 = 4

On the contrary, it is known that the subscript of node 1 is 3 and the subscript of node 3 is 4, then

The parent node index of node 1 is: (3 - 1) / 2 = 1

The parent node index of node 3 is: (4 - 1) / 2 = 1

How to use heap to solve Top-k problem in Java?

Big root heap VS small root heap

Big root heap (maximum heap)

The big root heap guarantees that the root node of each binary tree is larger than the left and right child nodes

Start adjusting from the root node of the last subtree to the root node of each subtree, so that each subtree is adjusted downwards into a large root heap, and finally make the final adjustment downward to ensure that the entire binary tree is a large root heap ( This adjustment is mainly for the later heap sorting).

How to use heap to solve Top-k problem in Java?

The specific adjustment process is as follows:

How to use heap to solve Top-k problem in Java?

How to use heap to solve Top-k problem in Java?

How to implement it with code?

We first adjust from the last subtree, then we need to get the root node parent of the last subtree. We know that the last node subscript of the array is len - 1, and this node is the last subtree. For the left or right child of the tree, you can get the root node subscript (parent) based on the child subscript. Parent-- allows each subtree to be adjusted until it reaches the root node, and then adjust downward for the last time. , you can get a big root pile.

How to use heap to solve Top-k problem in Java?

// 将数组变成大根堆结构
public void createHeap(int[] arr){
    for (int i = 0; i < arr.length; i++) {
        elem[i] = arr[i];// 放入elem[],假设不需要扩容
        usedSize++;
    }
    // 得到根节点parent, parent--依次来到每颗子树的根节点,
    for (int parent = (usedSize-1-1)/2; parent >= 0; parent--) {
        // 依次向下搜索,使得每颗子树都变成大根堆
        shiftDown(parent,usedSize);
    }
}
// 向下搜索变成大根堆
public void shiftDown(int parent,int len){
    int child = parent*2+1;// 拿到左孩子
    while (child < len){
        // 如果有右孩子,比较左右孩子大小,得到较大的值和父节点比较 
        if (child+1 < len && (elem[child] < elem[child+1])){
            child++;
        }
        // 比较较大的孩子和父节点,看是否要交换
        int max = elem[parent] >= elem[child] ? parent : child;
        if (max == parent) break;// 如果不需要调整了,说明当前子树已经是大根堆了,直接 break
        swap(elem,parent,child);
        parent = child;// 继续向下检测,看是否要调整
        child = parent*2+1;
    }
}
public void swap(int[] arr,int i,int j){
  	int temp = arr[i];
    arr[i] = arr[j];
    arr[j] = temp;
}
Small root heap (minimum heap)

The small root heap ensures that the root node of each binary tree is smaller than the left and right child nodes

The adjustment process is the same as above.

How to use heap to solve Top-k problem in Java?

Priority Queue (PriorityQueue)

In java, a heap data structure (PriorityQueue) is provided, also called a priority queue. When we When creating such an object, we get a small root heap with no added data. We can add or delete elements to it. Every time an element is deleted or added to it, the system will make an overall adjustment and adjust it to the small root heap again. heap.

// 默认得到一个小根堆
PriorityQueue<Integer> smallHeap = new PriorityQueue<>();
smallHeap.offer(23);
smallHeap.offer(2);
smallHeap.offer(11);
System.out.println(smallHeap.poll());// 弹出2,剩余最小的元素就是11,会被调整到堆顶,下一次弹出
System.out.println(smallHeap.poll());// 弹出11

 // 如果需要得到大根堆,在里面传一个比较器
 PriorityQueue<Integer> BigHeap = new PriorityQueue<>(new Comparator<Integer>() {
     @Override
     public int compare(Integer o1, Integer o2) {
         return o2 - o1;
     }
 });

2. Ideas for solving top-k problems

Example: There are a bunch of elements and you are asked to find the first three smallest elements.

Idea 1: Sort the array from small to large and get the first 3 elements of the array. However, it can be found that the time complexity of this method is too high and is not advisable.

Idea 2: Put all the elements into a heap structure, and then pop up three elements. Each pop-up element is the smallest in the current heap, then the three pop-up elements are the first three smallest elements. .

This idea can be done, but assuming I have 1,000,000 elements and only pop up the first three smallest elements, then a heap of size 1,000,000 will be used. The space complexity of doing this is too high, so this method is not recommended.

Three ideas:

We need to get the three smallest elements, then build a heap with a size of 3. Assume that the current heap structure is filled with exactly 3 elements, then this The three elements are the current three smallest elements. Assuming that the fourth element is one of the elements we want, then at least one of the first three elements is not what we want, and it needs to be popped. So who should pop up?

What we want to get is the first three smallest elements, so the largest element in the current heap structure must not be what we want, so here we build a large root heap. Pop the element and then put in the fourth element until the entire array is traversed.

How to use heap to solve Top-k problem in Java?

How to use heap to solve Top-k problem in Java?

How to use heap to solve Top-k problem in Java?

这样我们就得到了只含有前三个最小元素的堆,并且可以看到堆的大小一直都是3,而不是有多少数据就建多大的堆,然后再依次弹出元素就行了。

// 找前 k个最小的元素
public static int[] topK(int[] arr,int k){
     // 创建一个大小为 k的大根堆
     PriorityQueue<Integer> maxHeap = new PriorityQueue<>(k,new Comparator<Integer>() {
         @Override
         public int compare(Integer o1, Integer o2) {
             return o2 - o1;
         }
     });
     for (int i = 0; i < arr.length; i++) {
         if (i < k){
             // 放入前 k 个元素
             maxHeap.offer(arr[i]);
         }else{
             // 从第 k+1个元素开始进行判断是否要入堆
             if (maxHeap.peek() > arr[i]){
                 maxHeap.poll();
                 maxHeap.offer(arr[i]);
             }
         }
     }
     int[] ret = new int[k];
     for (int i = 0; i < k; i++) {
         ret[i] = maxHeap.poll();
     }
     return ret;
 }

The above is the detailed content of How to use heap to solve Top-k problem in Java?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete