Home > Article > Backend Development > “Be wary of time complexity pitfalls\"
A bilibili vedio also shows this : [Bilibili Video][https://www.bilibili.com/video/BV16u4m1c7cU/?spm_id_from=333.999.0.0] and I think it's a good vedio, but its language is Chinese.
Time complexity is a measure of how much time an algorithm takes to run as a function of the size of its input. It is a way to describe the efficiency of an algorithm, and it is used to compare different algorithms and determine which one is the most efficient.
How to calculate time complexity?
To calculate time complexity, we need to consider the number of operations performed by the algorithm as a function of the size of the input. The most common way to measure the number of operations is by counting the number of times a particular operation is performed.
What are some common pitfalls when calculating time complexity?
Here is a question:
Find the max 10 integers in a list.
import random ls = [random.randint(1, 100) for _ in range(n)]
Here is the test function:
import time def benchmark(func, *args) -> float: start = time.perf_counter() func(*args) end = time.perf_counter() return end - start
Here is the solution which uses the heapq module:
def find_max_n(ls, n): import heapq return heapq.nlargest(n, ls)
Or we write it in the python way:
def find_largest_n(nums, n): if n <= 0: return [] max_heap = [] for num in nums: if len(max_heap) < n: max_heap.append(num) # call sift_down for i in range((len(max_heap) - 2) // 2, -1, -1): _sift_down(max_heap, i) elif num > max_heap[0]: max_heap[0] = num _sift_down(max_heap, 0) return max_heap def _sift_down(heap, index): left = 2 * index + 1 right = 2 * index + 2 largest = index if left < len(heap) and heap[left] > heap[largest]: largest = left if right < len(heap) and heap[right] > heap[largest]: largest = right if largest != index: heap[index], heap[largest] = heap[largest], heap[index] _sift_down(heap, largest)
Here is the solution which uses the sort function:
def find_max_n(ls, n): return sorted(ls, reverse=True)[:n]
Alomost everyone know that The time complexity of the heap, is O(n log k), and the time complexity of the sort function is O(n log n).
It seems that the First solution is better than the second one. But it is not true in python.
Look at the final code:
import time def benchmark(func, *args) -> float: start = time.perf_counter() func(*args) end = time.perf_counter() return end - start def find_max_n_heapq(ls, n): import heapq return heapq.nlargest(n, ls) def find_max_n_python_heap(nums, n): if n <= 0: return [] max_heap = [] for num in nums: if len(max_heap) < n: max_heap.append(num) # call sift_down for i in range((len(max_heap) - 2) // 2, -1, -1): _sift_down(max_heap, i) elif num > max_heap[0]: max_heap[0] = num _sift_down(max_heap, 0) return max_heap def _sift_down(heap, index): left = 2 * index + 1 right = 2 * index + 2 largest = index if left < len(heap) and heap[left] > heap[largest]: largest = left if right < len(heap) and heap[right] > heap[largest]: largest = right if largest != index: heap[index], heap[largest] = heap[largest], heap[index] _sift_down(heap, largest) def find_max_n_sorted(ls, n): return sorted(ls, reverse=True)[:n] # test import random for n in range(10, 10000, 100): ls = [random.randint(1, 100) for _ in range(n)] print(f"n = {n}") print(f"Use Heapq: {benchmark(find_max_n_heapq, ls, n)}") print(f"Python Heapq: {benchmark(find_max_n_python_heap, ls, n)}") print(f"Sorted : {benchmark(find_max_n_sorted, ls, n)}")
I run it in Python3.11 VScode, Here is the result:
Use Heapq: 0.002430099993944168
Python Heapq: 0.06343129999004304
Sorted : 9.280000813305378e-05
n = 910
Use Heapq: 9.220000356435776e-05
Python Heapq: 0.07715500006452203
Sorted : 9.360001422464848e-05
n = 920
Use Heapq: 9.440002031624317e-05
Python Heapq: 0.06573969998862594
Sorted : 0.00012450001668184996
n = 930
Use Heapq: 9.689992293715477e-05
Python Heapq: 0.06760239996947348
Sorted : 9.66999214142561e-05
n = 940
Use Heapq: 9.600003249943256e-05
Python Heapq: 0.07372559991199523
Sorted : 9.680003859102726e-05
n = 950
Use Heapq: 9.770004544407129e-05
Python Heapq: 0.07306570000946522
Sorted : 0.00011979998089373112
n = 960
Use Heapq: 9.980006143450737e-05
Python Heapq: 0.0771204000338912
Sorted : 0.00022829999215900898
n = 970
Use Heapq: 0.0001601999392732978
Python Heapq: 0.07493270002305508
Sorted : 0.00010840001050382853
n = 980
Use Heapq: 9.949994273483753e-05
Python Heapq: 0.07698320003692061
Sorted : 0.00010300008580088615
n = 990
Use Heapq: 9.979994501918554e-05
Python Heapq: 0.0848745999392122
Sorted : 0.00012620002962648869
n = 10000
Heapq 사용: 0.003642000025138259
파이썬 힙q: 9.698883199947886
정렬 : 0.00107999995816499
n = 11000
Heapq 사용: 0.0014836000045761466
파이썬 힙q: 10.537632800056599
정렬 : 0.0012236000038683414
n = 12000
Heapq 사용: 0.001384599949233234
파이썬 힙q: 12.328411899972707
정렬 : 0.0013226999435573816
n = 13000
Heapq 사용: 0.0020017001079395413
파이썬 힙q: 15.637207800056785
정렬 : 0.0015075999544933438
n = 14000
Heapq 사용: 0.0017026999266818166
파이썬 힙q: 17.298848500009626
정렬 : 0.0016967999981716275
n = 15000
Heapq 사용: 0.0017773000290617347
파이썬 힙q: 20.780625900020823
정렬 : 0.0017105999868363142
n이 큰 경우 Sorted는 약간의 시간이 소요되지만(때로는 heapq를 사용하는 것보다 더 나음) Python Heapq은 많은 시간이 소요됩니다.
Bulit-in 기능은 컴파일 언어인 C로 작성되었기 때문에 heapq보다 빠릅니다.
대량 데이터를 처리할 때는 코드 성능을 향상시키기 위해 heapq.sort() 대신 내장 함수를 사용해야 합니다. 대용량 데이터를 다룰 때 시간 복잡도 문제에 주의해야 합니다. 때로는 시간 복잡성의 함정을 피할 수 없지만 가능한 한 이를 피하도록 노력해야 합니다.
안녕하세요 멍친위안 입니다. 나는 학생입니다. 나는 새로운 것을 배우는 것을 좋아합니다.
내 github을 볼 수 있습니다: [MengQinYuan의 Github][https://github.com/mengqinyuan]
The above is the detailed content of “Be wary of time complexity pitfalls\". For more information, please follow other related articles on the PHP Chinese website!