Home  >  Article  >  Backend Development  >  “Be wary of time complexity pitfalls\"

“Be wary of time complexity pitfalls\"

WBOY
WBOYOriginal
2024-08-01 20:45:47541browse

“Be wary of time complexity pitfalls

Be wary of time complexity pitfalls

Write Here

A bilibili vedio also shows this : [Bilibili Video][https://www.bilibili.com/video/BV16u4m1c7cU/?spm_id_from=333.999.0.0] and I think it's a good vedio, but its language is Chinese.

time complexity

  • What does time complexity mean?
  • Time complexity is a measure of how much time an algorithm takes to run as a function of the size of its input. It is a way to describe the efficiency of an algorithm, and it is used to compare different algorithms and determine which one is the most efficient.

  • How to calculate time complexity?

  • To calculate time complexity, we need to consider the number of operations performed by the algorithm as a function of the size of the input. The most common way to measure the number of operations is by counting the number of times a particular operation is performed.

  • What are some common pitfalls when calculating time complexity?

    1. Not considering the input size: If we only consider the number of operations performed by the algorithm, we may underestimate the time complexity. For example, if we count the number of times a loop runs, but we don't take into account the size of the input, then the time complexity may be too high.
    1. Not considering the algorithm's efficiency: Some algorithms may perform many operations even when the input size is small, which can lead to high time complexity. For example, sorting algorithms like bubble sort and selection sort have quadratic time complexity, even though they may only need to swap two elements in a small array.
    1. Not considering the algorithm's worst-case scenario: Some algorithms have a best-case scenario where they perform fewer operations than the worst-case scenario. For example, searching algorithms like binary search have a best-case scenario where they find the target element in logarithmic time, but they have a worst-case scenario where they need to examine all elements in the array.

Let's see some examples of time complexity

Here is a question:
Find the max 10 integers in a list.

import random
ls = [random.randint(1, 100) for _ in range(n)]

Here is the test function:

import time
def benchmark(func, *args) -> float:
    start = time.perf_counter()
    func(*args)
    end = time.perf_counter()
    return end - start

Solution1: Use heap

Here is the solution which uses the heapq module:

def find_max_n(ls, n):
    import heapq
    return heapq.nlargest(n, ls)

Or we write it in the python way:

def find_largest_n(nums, n):
    if n <= 0:
        return []

    max_heap = []

    for num in nums:
        if len(max_heap) < n:
            max_heap.append(num)
            # call sift_down
            for i in range((len(max_heap) - 2) // 2, -1, -1):
                _sift_down(max_heap, i)
        elif num > max_heap[0]:
            max_heap[0] = num
            _sift_down(max_heap, 0)

    return max_heap

def _sift_down(heap, index):
    left = 2 * index + 1
    right = 2 * index + 2
    largest = index

    if left < len(heap) and heap[left] > heap[largest]:
        largest = left

    if right < len(heap) and heap[right] > heap[largest]:
        largest = right

    if largest != index:
        heap[index], heap[largest] = heap[largest], heap[index]
        _sift_down(heap, largest)

Solution2: Use sort

Here is the solution which uses the sort function:

def find_max_n(ls, n):
    return sorted(ls, reverse=True)[:n]

Alomost everyone know that The time complexity of the heap, is O(n log k), and the time complexity of the sort function is O(n log n).

It seems that the First solution is better than the second one. But it is not true in python.

Look at the final code:

import time
def benchmark(func, *args) -> float:
    start = time.perf_counter()
    func(*args)
    end = time.perf_counter()
    return end - start

def find_max_n_heapq(ls, n):
    import heapq
    return heapq.nlargest(n, ls)

def find_max_n_python_heap(nums, n):
    if n <= 0:
        return []

    max_heap = []

    for num in nums:
        if len(max_heap) < n:
            max_heap.append(num)
            # call sift_down
            for i in range((len(max_heap) - 2) // 2, -1, -1):
                _sift_down(max_heap, i)
        elif num > max_heap[0]:
            max_heap[0] = num
            _sift_down(max_heap, 0)

    return max_heap

def _sift_down(heap, index):
    left = 2 * index + 1
    right = 2 * index + 2
    largest = index

    if left < len(heap) and heap[left] > heap[largest]:
        largest = left

    if right < len(heap) and heap[right] > heap[largest]:
        largest = right

    if largest != index:
        heap[index], heap[largest] = heap[largest], heap[index]
        _sift_down(heap, largest)


def find_max_n_sorted(ls, n):
    return sorted(ls, reverse=True)[:n]

# test
import random
for n in range(10, 10000, 100):
    ls = [random.randint(1, 100) for _ in range(n)]
    print(f"n = {n}")
    print(f"Use    Heapq: {benchmark(find_max_n_heapq, ls, n)}")
    print(f"Python Heapq: {benchmark(find_max_n_python_heap, ls, n)}")
    print(f"Sorted      : {benchmark(find_max_n_sorted, ls, n)}")

I run it in Python3.11 VScode, Here is the result:

n is not big

Use Heapq: 0.002430099993944168
Python Heapq: 0.06343129999004304
Sorted : 9.280000813305378e-05
n = 910
Use Heapq: 9.220000356435776e-05
Python Heapq: 0.07715500006452203
Sorted : 9.360001422464848e-05
n = 920
Use Heapq: 9.440002031624317e-05
Python Heapq: 0.06573969998862594
Sorted : 0.00012450001668184996
n = 930
Use Heapq: 9.689992293715477e-05
Python Heapq: 0.06760239996947348
Sorted : 9.66999214142561e-05
n = 940
Use Heapq: 9.600003249943256e-05
Python Heapq: 0.07372559991199523
Sorted : 9.680003859102726e-05
n = 950
Use Heapq: 9.770004544407129e-05
Python Heapq: 0.07306570000946522
Sorted : 0.00011979998089373112
n = 960
Use Heapq: 9.980006143450737e-05
Python Heapq: 0.0771204000338912
Sorted : 0.00022829999215900898
n = 970
Use Heapq: 0.0001601999392732978
Python Heapq: 0.07493270002305508
Sorted : 0.00010840001050382853
n = 980
Use Heapq: 9.949994273483753e-05
Python Heapq: 0.07698320003692061
Sorted : 0.00010300008580088615
n = 990
Use Heapq: 9.979994501918554e-05
Python Heapq: 0.0848745999392122
Sorted : 0.00012620002962648869

n이 크면?

n = 10000
Heapq 사용: 0.003642000025138259
파이썬 힙q: 9.698883199947886
정렬 : 0.00107999995816499
n = 11000
Heapq 사용: 0.0014836000045761466
파이썬 힙q: 10.537632800056599
정렬 : 0.0012236000038683414
n = 12000
Heapq 사용: 0.001384599949233234
파이썬 힙q: 12.328411899972707
정렬 : 0.0013226999435573816
n = 13000
Heapq 사용: 0.0020017001079395413
파이썬 힙q: 15.637207800056785
정렬 : 0.0015075999544933438
n = 14000
Heapq 사용: 0.0017026999266818166
파이썬 힙q: 17.298848500009626
정렬 : 0.0016967999981716275
n = 15000
Heapq 사용: 0.0017773000290617347
파이썬 힙q: 20.780625900020823
정렬 : 0.0017105999868363142

내가 찾은 내용 및 개선 방법

n이 큰 경우 Sorted는 약간의 시간이 소요되지만(때로는 heapq를 사용하는 것보다 더 나음) Python Heapq은 많은 시간이 소요됩니다.

  • Sorted는 시간이 조금 걸리지만 Python Heapq은 시간이 많이 걸리는 이유는 무엇입니까?
  • sorted()는 Python에 내장된 함수이기 때문에 이에 대한 Python 공식 문서를 찾을 수 있습니다.

Bulit-in 기능은 컴파일 언어인 C로 작성되었기 때문에 heapq보다 빠릅니다.

  • 어떻게 개선할 수 있나요?
  • heapq.sort() 대신 내장 함수 sorted()를 사용하여 코드 성능을 향상시킬 수 있습니다. sorted() 함수는 Python에 내장된 함수로 C로 구현되므로 heapq.sort()보다 훨씬 빠릅니다

경련

대량 데이터를 처리할 때는 코드 성능을 향상시키기 위해 heapq.sort() 대신 내장 함수를 사용해야 합니다. 대용량 데이터를 다룰 때 시간 복잡도 문제에 주의해야 합니다. 때로는 시간 복잡성의 함정을 피할 수 없지만 가능한 한 이를 피하도록 노력해야 합니다.

나에 대해서

안녕하세요 멍친위안 입니다. 나는 학생입니다. 나는 새로운 것을 배우는 것을 좋아합니다.
내 github을 볼 수 있습니다: [MengQinYuan의 Github][https://github.com/mengqinyuan]

The above is the detailed content of “Be wary of time complexity pitfalls\". For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn