Home >Backend Development >Python Tutorial >20 tips to make your Python fly

20 tips to make your Python fly

高洛峰
高洛峰Original
2017-02-24 15:34:321374browse

The article shared today has not much text but mainly code. Absolutely informative and straightforward, it mainly shares 20 tips for improving Python performance and teaches you how to say goodbye to slow Python. Original author Kaiyuan, a full-stack programmer, uses Python, Java, PHP and C++.

1. Optimize algorithm time complexity

The time complexity of the algorithm has the greatest impact on the execution efficiency of the program. In Python, you can select appropriate data Structure to optimize time complexity. For example, the time complexity of searching for a certain element in list and set is O(n) and O(1) respectively. Different scenarios have different optimization methods. Generally speaking, there are generally ideas such as divide and conquer, branch and bound, greedy, and dynamic programming.

2. Reduce redundant data

For example, use upper triangle or lower triangle to save a large symmetric matrix. Use sparse matrix representation in matrices where 0 elements account for the majority.

3. Reasonable use of copy and deepcopy

For objects of data structures such as dict and list, direct assignment uses reference. In some cases, you need to copy the entire object. In this case, you can use copy and deepcopy in the copy package. The difference between these two functions is that the latter copies recursively. The efficiency is also different: (The following program is run in ipython)

import copy
a = range(100000)
%timeit -n 10 copy.copy(a) # 运行10次 copy.copy(a)

%timeit -n 10 copy.deepcopy(a)
10 loops, best of 3: 1.55 ms per loop
10 loops, best of 3: 151 ms per loop

The -n after timeit indicates the number of runs, and the last two lines correspond to two The output of timeit is the same below. It can be seen that the latter is an order of magnitude slower.

4. Use dict or set to find elements

Python dict and set are implemented using hash tables (similar to unordered_map in the c++11 standard library), search The time complexity of elements is O(1).

a = range(1000)
s = set(a)
d = dict((i,1) for i in a)
%timeit -n 10000 100 in d
%timeit -n 10000 100 in s
10000 loops, best of 3: 43.5 ns per loop
10000 loops, best of 3: 49.6 ns per loop

dict is slightly more efficient (and takes up more space).

5. Proper use of generators and yield

%timeit -n 100 a = (i for i in range(100000))
%timeit -n 100 b = [i for i in range(100000)]
100 loops, best of 3: 1.54 ms per loop
100 loops, best of 3: 4.56 ms per loop

Use () to get a generator Object, the memory space required has nothing to do with the size of the list, so the efficiency will be higher. In specific applications, for example, set(i for i in range(100000)) will be faster than set([i for i in range(100000)]).

But for situations where loop traversal is required:

%timeit -n 10 for x in (i for i in range(100000)): pass
%timeit -n 10 for x in [i for i in range(100000)]: pass

10 loops, best of 3: 6.51 ms per loop
10 loops, best of 3: 5.54 ms per loop

The latter is more efficient, but if there is a break in the loop, use generator The benefits are obvious. Yield is also used to create generators:

def yield_func(ls):
  for i in ls:
    yield i+1

def not_yield_func(ls):
  return [i+1 for i in ls]

ls = range(1000000)
%timeit -n 10 for i in yield_func(ls):pass

%timeit -n 10 for i in not_yield_func(ls):pass

10 loops, best of 3: 63.8 ms per loop
10 loops, best of 3: 62.9 ms per loop

For lists that are not very large in memory, you can directly return a list, but the readability of yield is better (personally preferences).
Python2.x’s built-in generator functions include the xrange function, itertools package, etc.

6. Optimize loops

Do not put things that can be done outside the loop inside the loop. For example, the following optimization can be twice as fast:

a = range(10000)
size_a = len(a)
%timeit -n 1000 for i in a: k = len(a)
%timeit -n 1000 for i in a: k = size_a
1000 loops, best of 3: 569 µs per loop
1000 loops, best of 3: 256 µs per loop

7. Optimize the order of multiple judgment expressions

For and, the one with the least satisfying conditions should be placed first. or, put the ones that meet the most conditions first. For example:

a = range(2000) 
%timeit -n 100 [i for i in a if 10 < i < 20 or 1000 < i < 2000]
%timeit -n 100 [i for i in a if 1000 < i < 2000 or 100 < i < 20]   
%timeit -n 100 [i for i in a if i % 2 == 0 and i > 1900]
%timeit -n 100 [i for i in a if i > 1900 and i % 2 == 0]
100 loops, best of 3: 287 µs per loop
100 loops, best of 3: 214 µs per loop
100 loops, best of 3: 128 µs per loop
100 loops, best of 3: 56.1 µs per loop

8. Use join to merge the strings in the iterator

In [1]: %%timeit
  ...: s = &#39;&#39;
  ...: for i in a:
  ...:     s += i
  ...:
10000 loops, best of 3: 59.8 µs per loop

In [2]: %%timeit
s = &#39;&#39;.join(a)
  ...:
100000 loops, best of 3: 11.8 µs per loop

join has about 5 times improvement in the accumulation method.

9. Choose the appropriate formatting character method

s1, s2 = &#39;ax&#39;, &#39;bx&#39;

%timeit -n 100000 &#39;abc%s%s&#39; % (s1, s2)
%timeit -n 100000 &#39;abc{0}{1}&#39;.format(s1, s2)
%timeit -n 100000 &#39;abc&#39; + s1 + s2
100000 loops, best of 3: 183 ns per loop
100000 loops, best of 3: 169 ns per loop
100000 loops, best of 3: 103 ns per loop

Among the three situations, the % method is the most suitable Slow, but the difference between the three is not big (all are very fast). (Personally, I think % is the most readable)

10. Exchange the values ​​of two variables without using intermediate variables

In [3]: %%timeit -n 10000
  a,b=1,2
  ....: c=a;a=b;b=c;
  ....:
10000 loops, best of 3: 172 ns per loop

In [4]: %%timeit -n 10000

a,b=1,2a,b=b,a
  ....:
10000 loops, best of 3: 86 ns per loop

Use a,b=b,a instead of c=a;a=b;b=c; to exchange the values ​​​​of a, b, which can be more than 1 times faster.

11. Using if is

a = range(10000)
%timeit -n 100 [i for i in a if i == True]
%timeit -n 100 [i for i in a if i is True]
100 loops, best of 3: 531 µs per loop
100 loops, best of 3: 362 µs per loop

Using if is True is nearly twice as fast as if == True.

12. Use cascade comparison x < y < z

x, y, z = 1,2,3

%timeit -n 1000000 if x < y < z:pass
%timeit -n 1000000 if x < y and y < z:pass

1000000 loops, best of 3: 101 ns per loop
1000000 loops, best of 3: 121 ns per loop

x < y < z is slightly more efficient and more readable.

13. while 1 is faster than while True

def while_1():
  n = 100000
  while 1:
    n -= 1
    if n <= 0: break

def while_true():
  n = 100000
  while True:
    n -= 1
    if n <= 0: break

m, n = 1000000, 1000000

%timeit -n 100 while_1()
%timeit -n 100 while_true()
100 loops, best of 3: 3.69 ms per loop
100 loops, best of 3: 5.61 ms per loop

while 1 is much faster than while true, the reason is In python2.x, True is a global variable, not a keyword.

14. Using ** instead of pow

%timeit -n 10000 c = pow(2,20)
%timeit -n 10000 c = 2**20

10000 loops, best of 3: 284 ns per loop
10000 loops, best of 3: 16.9 ns per loop

** is more than 10 times faster!

15. Use cProfile, cStringIO and cPickle to implement the same function (respectively corresponding to profile, StringIO, pickle) packages

import cPickle
import pickle
a = range(10000)
%timeit -n 100 x = cPickle.dumps(a)
%timeit -n 100 x = pickle.dumps(a)
100 loops, best of 3: 1.58 ms per loop
100 loops, best of 3: 17 ms per loop

The package implemented by c is more than 10 times faster!

16. Use the best deserialization method

The following compares the efficiency of eval, cPickle, and json methods for deserializing corresponding strings:

import json
import cPickle
a = range(10000)
s1 = str(a)
s2 = cPickle.dumps(a)
s3 = json.dumps(a)
%timeit -n 100 x = eval(s1)
%timeit -n 100 x = cPickle.loads(s2)
%timeit -n 100 x = json.loads(s3)
100 loops, best of 3: 16.8 ms per loop
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 798 µs per loop

It can be seen that json is nearly 3 times faster than cPickle and more than 20 times faster than eval.

17. Use C extension (Extension)

Currently there are three main methods: CPython (the most common implementation method of python) native API, ctypes, Cython, and cffi , their function is to enable Python programs to call dynamic link libraries compiled from C. Their characteristics are:

CPython native API: By introducing the Python.h header file, Python data structures can be used directly in the corresponding C program. The implementation process is relatively cumbersome, but it has a relatively large scope of application.
ctypes: Usually used to wrap (wrap) C programs, allowing pure Python programs to call functions in dynamic link libraries (dll in Windows or so files in Unix) . If you want to use an existing C library in python, using ctypes is a good choice. According to some benchmark tests, python2+ctypes is the best performance method.
Cython: Cython is a superset of CPython that simplifies the process of writing C extensions. The advantage of Cython is that its syntax is concise and it is well compatible with libraries such as numpy that contain a large number of C extensions. Cython's enabling scenarios are generally aimed at the optimization of a certain algorithm or process in the project. In some tests, performance improvements can be hundreds of times greater.
cffi: cffi is the implementation of ctypes in pypy (see below for details), and is also compatible with CPython. cffi provides a way to use C class libraries in python. You can write C code directly in python code and support linking to existing C class libraries.
The use of these optimization methods is generally aimed at optimizing the performance bottleneck modules of existing projects, which can greatly improve the operating efficiency of the entire program with a small amount of changes to the original project.

18. Parallel Programming

Because of the existence of GIL, it is difficult for Python to take full advantage of multi-core CPUs. However, the following parallel modes can be achieved through the built-in module multiprocessing:

Multiple processes: For CPU-intensive programs, you can use multiprocessing's Process and Pool Wait for the encapsulated classes to implement parallel computing through multiple processes. However, because the communication cost in the process is relatively large, the efficiency of programs that require a large amount of data interaction between processes may not be greatly improved.
Multi-threading: For IO-intensive programs, the multiprocessing.dummy module uses the multiprocessing interface to encapsulate threading, making multi-threaded programming very easy (for example, you can use Pool map interface, simple and efficient).
Distributed: The Managers class in multiprocessing provides a way to share data between different processes, and distributed programs can be developed on this basis.
In different business scenarios, you can choose one or a combination of several to optimize program performance.

19. The ultimate killer: PyPy

PyPy is Python implemented using RPython (a subset of CPython). According to the benchmark test data on the official website, it is better than CPython's implementation of Python is more than 6 times faster. The reason for the speed is the use of a Just-in-Time (JIT) compiler, that is, a dynamic compiler. Different from static compilers (such as gcc, javac, etc.), it uses data from the running process of the program for optimization. Due to historical reasons, the GIL is still retained in pypy, but the ongoing STM project is trying to turn PyPy into Python without the GIL.

If the python program contains C extensions (non-cffi method), the optimization effect of JIT will be greatly reduced, even slower than CPython (than Numpy). So in PyPy it's better to use pure Python or use the cffi extension.

With the improvement of STM, Numpy and other projects, I believe that PyPy will replace CPython.

20. Use performance analysis tools

In addition to the timeit module used in ipython above, there is also cProfile. The use of cProfile is also very simple: python -m cProfile filename.py. filename.py is the file name of the program to be run. You can see the number of times each function is called and the running time in the standard output to find the program's Performance bottlenecks can then be optimized in a targeted manner.

The above is the entire content of this article. I hope it will be helpful to everyone's learning. I also hope that everyone will support the PHP Chinese website.

For more 20 tips to make your Python fly and related articles, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn