Home  >  Article  >  Backend Development  >  Python solution to the maximum K number problem

Python solution to the maximum K number problem

高洛峰
高洛峰Original
2017-03-02 11:16:161132browse

TopK problem, that is, finding the largest K number. This problem is very common, such as finding the 10 most popular keywords from 10 million search records.

Method 1:
Sort first, and then intercept the first k numbers.
Time complexity: O(n*logn)+O(k)=O(n*logn).
This method is relatively simple and crude, just mention it.

Method 2: Maximum Heap

We can create a data container of size K to store the smallest K numbers, and then traverse the entire array and add each number Compare it with the maximum number in the container. If this number is greater than the maximum value in the container, continue traversing, otherwise replace the maximum value in the container with this number. The understanding of this method is also very simple. As for the choice of container, many people's first reaction is the maximum heap. But how to implement the maximum heap in Python? We can use the heapq library that implements the minimum heap, because in an array, if each number is inverted, the maximum number becomes the minimum number, and the order of the entire number changes, so each number in the array can be inverted. , and then use the minimum heap to negate the result when finally returning it. The code is as follows:

import heapq
def get_least_numbers_big_data(self, alist, k):
  max_heap = []
  length = len(alist)
  if not alist or k <= 0 or k > length:
    return
  k = k - 1
  for ele in alist:
    ele = -ele
    if len(max_heap) <= k:
      heapq.heappush(max_heap, ele)
    else:
      heapq.heappushpop(max_heap, ele)

  return map(lambda x:-x, max_heap)


if __name__ == "__main__":
  l = [1, 9, 2, 4, 7, 6, 3]
  min_k = get_least_numbers_big_data(l, 3)

Method 3: quick select

quick select algorithm. In fact, it is similar to quick sort. The difference is that quick select only needs to go in one direction for each trip.
Time complexity: O(n).

def qselect(A,k): 
  if len(A)<k:return A 
  pivot = A[-1] 
  right = [pivot] + [x for x in A[:-1] if x>=pivot] 
  rlen = len(right) 
  if rlen==k: 
    return right 
  if rlen>k: 
    return qselect(right, k) 
  else: 
    left = [x for x in A[:-1] if x<pivot] 
    return qselect(left, k-rlen) + right 
 
for i in range(1, 10): 
  print qselect([11,8,4,1,5,2,7,9], i)


For more articles related to the Python version of the maximum K number problem, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn