Home >Backend Development >Python Tutorial >python multithreading
First, let’s explain the application scenarios of multi-threading: when Python processes multiple tasks, these tasks are asynchronous in nature and require multiple concurrent transactions. The running order of each transaction can be uncertain, random, and unpredictable. . Computing-intensive tasks can be executed sequentially into multiple subtasks, or processed in a multi-threaded manner. However, I/O-intensive tasks are difficult to handle in a single-threaded manner. If multi-threading is not used, it can only be handled with one or more timers.
Let’s talk about processes and threads: A process (sometimes called a heavyweight process) is an execution of a program. Just like in centos, when ps -aux | grep something, there is always a process generated by itself, that is In this grep process, each process has its own address space, memory, data stack, and other auxiliary data that records its running trajectory. Therefore, each process cannot directly share information and can only use inter-process communication (IPC).
The biggest difference between threads (lightweight processes) and processes is that all threads run in the same process, share the same operating environment, and share the same data space. Therefore, threads can share data and communicate with each other more conveniently than processes, and execute transactions concurrently.
In order to easily understand the relationship between memory processes and threads, we can make an analogy: compare the CPU to a moving company, and this moving company only has one car (process) for use. At first, this moving company was very poor. , there is only one employee (single thread), then this moving company can only move up to 5 houses in a day. Later, the boss made money. Instead of buying a car, he hired n more employees (multi-thread), so , each employee will be arranged to move only to one house at a time, and then go to rest, give up the car, and let others move to the next house. This does not actually improve efficiency, but increases costs, right? This is because GIL (Global Interpreter Lock) global interpreter lock ensures thread safety (ensures that data can be read safely), that is, only one thread can run on the CPU at the same time. This is a unique mechanism of python, that is to say, even if your The running environment has dual CPUs, and the Python virtual machine will only use one CPU. This means that GIL directly causes CPython to be unable to use the performance of physical multi-cores to accelerate operations. For a detailed explanation (issues left over from history, hardware is developing too fast), you can refer to this blog:
http://blog.sina.com.cn/s/blog_64ecfc2f0102uzzf.html
In python core programming, the author strongly recommends We should not use the thread module but the threading module for the following reasons:
1. When the main thread exits, all other threads exit without being cleared. The thread module cannot protect the safe exit of all child threads. That is, the thread module does not support daemons.
2. The attributes of the thread module may conflict with threading.
3. The low-level thread module has very few synchronization primitives (actually there is only one, which should be sleep).
1. Thread module
The following are two sample codes without using GIL and using GIL:
1. Code sample without using GIL:
from time import sleep,ctime import thread def loop0(): print 'start loop 0 at: ',ctime() sleep(4) print 'loop 0 done at: ',ctime() def loop1(): print 'start loop 1 at: ',ctime() sleep(2) print 'loop 1 done at: ',ctime() def main(): print 'start at: ',ctime() thread.start_new_thread(loop0,()) thread.start_new_thread(loop1,()) sleep(6) print 'all loop is done, ' ,ctime() if __name__=='__main__': main() 输出结果: start at: Thu Jan 28 10:46:27 2016 start loop 0 at: Thu Jan 28 10:46:27 2016 start loop 1 at: Thu Jan 28 10:46:27 2016 loop 1 done at: Thu Jan 28 10:46:29 2016 loop 0 done at: Thu Jan 28 10:46:31 2016 all loop is done, Thu Jan 28 10:46:33 2016
As can be seen from the above output, we successfully started two threads , and synchronized with the main thread. At 2s, loop1 is completed first, loop0 is completed at 4s, and after another 2s, the main thread is completed. The entire main thread has passed 6 seconds, and loop0 and loop1 are completed simultaneously.
2. Code example using GIL:
import thread from time import sleep,ctime loops = [4,2] def loop(nloop,nsec,lock): print 'start loop',nloop,'at: ',ctime() sleep(nsec) print 'loop',nloop,'done at:',ctime() lock.release() def main(): print 'starting at:',ctime() locks = [] nloops = range(len(loops)) for i in nloops: lock = thread.allocate_lock() #创建锁的列表,存在locks中 lock.acquire() locks.append(lock) for i in nloops: thread.start_new_thread(loop,(i,loops[i],locks[i])) #创建线程,参数为循环号,睡眠时间,锁 for i in nloops: while locks[i].locked(): #等待循环完成,解锁 pass print 'all DONE at:',ctime() if __name__ == '__main__': main() 以上输出如下: starting at: Thu Jan 28 14:59:22 2016 start loop 0 at: Thu Jan 28 14:59:22 2016 start loop 1 at: Thu Jan 28 14:59:22 2016 loop 1 done at: Thu Jan 28 14:59:24 2016 loop 0 done at: Thu Jan 28 14:59:26 2016 all DONE at: Thu Jan 28 14:59:26 2016
lasts 4 seconds, which improves efficiency and is more reasonable than using a sleep() function to time in the main thread.
2. Threading module
1. Thread class
In the thread class, you can use the following three methods to create threads:
(1) Create a thread instance and pass it a function
(2 ) Create a thread instance and pass it a callable class object
(3) Derive a subclass from thread and create an object of this subclass
Method (1)
__author__ = 'dell' import threading from time import sleep,ctime def loop0(): print 'start loop 0 at:',ctime() sleep(4) print 'loop 0 done at:',ctime() def loop1(): print 'start loop 1 at:',ctime() sleep(2) print 'loop 1 done at:',ctime() def main(): print 'starting at:',ctime() threads = [] t1 = threading.Thread(target=loop0,args=()) #创建线程 threads.append(t1) t2 = threading.Thread(target=loop1,args=()) threads.append(t2) for t in threads: t.setDaemon(True)<span style="white-space:pre"> </span> #开启守护线程(一定要在start()前调用) t.start()<span style="white-space:pre"> </span> #开始线程执行 for t in threads:<span style="white-space:pre"> </span> t.join()<span style="white-space:pre"> </span> #将程序挂起阻塞,直到线程结束,如果给出数值,则最多阻塞timeout秒 if __name__ == '__main__': main() print 'All DONE at:',ctime() 在这里,就不用像thread模块那样要管理那么多锁(分配、获取、释放、检查等)了,同时我也减少了循环的代码,直接自己编号循环了,得到输出如下: starting at: Thu Jan 28 16:38:14 2016 start loop 0 at: Thu Jan 28 16:38:14 2016 start loop 1 at: Thu Jan 28 16:38:14 2016 loop 1 done at: Thu Jan 28 16:38:16 2016 loop 0 done at: Thu Jan 28 16:38:18 2016 All DONE at: Thu Jan 28 16:38:18 2016
The result is the same, but from the code From a logical point of view, it is much clearer. The other two types will not be posted here. The biggest difference between instantiating a Thread and calling thread.start_new_thread is that the new thread will not start executing immediately. That is to say, in the Thread class of the threading module, after we instantiate it, it will be unified after calling the .start() function. Execution, which makes our program have good synchronization characteristics.
The following is a comparison example between single thread and multi-threading. Two sets of operations are completed by multiplication and division respectively, thus showing the improvement of efficiency by multi-threading
from time import ctime,sleep import threading def multi(): num1 = 1 print 'start mutiple at:',ctime() for i in range(1,10): num1 = i*num1 sleep(0.2) print 'mutiple finished at:',ctime() return num1 def divide(): num2 = 100 print 'start division at:',ctime() for i in range(1,10): num2 = num2/i sleep(0.4) print 'division finished at:',ctime() return num2 def main(): print '---->single Thread' x1 = multi() x2 = divide() print 'The sum is ',sum([x1,x2]),'\nfinished singe thread',ctime() print '----->Multi Thread' threads = [] t1 = threading.Thread(target=multi,args=()) threads.append(t1) t2 = threading.Thread(target=divide,args=()) threads.append(t2) for t in threads: t.setDaemon(True) t.start() for t in threads: t.join() if __name__ == '__main__': main() 结果如下: ---->single Thread start mutiple at: Thu Jan 28 21:41:18 2016 mutiple finished at: Thu Jan 28 21:41:20 2016 start division at: Thu Jan 28 21:41:20 2016 division finished at: Thu Jan 28 21:41:24 2016 The sum is 362880 finished singe thread Thu Jan 28 21:41:24 2016 ----->Multi Thread start mutiple at: Thu Jan 28 21:41:24 2016 start division at: Thu Jan 28 21:41:24 2016 mutiple finished at: Thu Jan 28 21:41:26 2016 division finished at: Thu Jan 28 21:41:27 2016 The sum is : 362880