Home >Backend Development >Python Tutorial >An in-depth understanding of multithreading in Python. A must-read for newbies
Example 1
We are going to request five different URLs:
Single threaded
import time import urllib2 defget_responses(): urls=[ ‘http://www.baidu.com', ‘http://www.amazon.com', ‘http://www.ebay.com', ‘http://www.alibaba.com', ‘http://www.jb51.net' ] start=time.time() forurlinurls: printurl resp=urllib2.urlopen(url) printresp.getcode() print”Elapsed time: %s”%(time.time()-start) get_responses()
The output is:
http://www.baidu.com200
http://www.amazon.com200
http://www.ebay .com200
http://www.alibaba.com200
http://www.jb51.net200
Elapsed
time:3.0814409256
Explanation:
urls are requested in order
Unless the cpu gets a response from one url, it will not request the next url
The network request will take a long time, so the cpu is waiting for the return time of the network request has been idle.
Multi-threading
import urllib2 import time from threading import Thread classGetUrlThread(Thread): def__init__(self, url): self.url=url super(GetUrlThread,self).__init__() defrun(self): resp=urllib2.urlopen(self.url) printself.url, resp.getcode() defget_responses(): urls=[ ‘http://www.baidu.com', ‘http://www.amazon.com', ‘http://www.ebay.com', ‘http://www.alibaba.com', ‘http://www.jb51.net' ] start=time.time() threads=[] forurlinurls: t=GetUrlThread(url) threads.append(t) t.start() fortinthreads: t.join() print”Elapsed time: %s”%(time.time()-start) get_responses()
Output:
http://www.jb51.net200
http://www.baidu.com200
http://www.amazon.com200
http://www.alibaba.com200
http ://www.ebay.com200
Elapsed
time:0.689890861511
Explanation:
Aware of the improvement in program execution time
We wrote a multi-threaded program to reduce the waiting time of the CPU. When we are waiting for a network request in a thread to return, the CPU can Switch to other threads to make network requests in other threads.
We expect one thread to process one url, so we pass a url when instantiating the thread class.
Thread running means executing the run() method in the class.
No matter what, we want each thread to execute run().
Create a thread for each URL and call the start() method, which tells the CPU to execute the run() method in the thread.
We want to calculate the time spent when all threads have finished executing, so we call the join() method.
join() can notify the main thread to wait for this thread to end before executing the next instruction.
We called the join() method on each thread, so we calculated the running time after all threads have completed execution.
About threads:
cpu may not execute the run() method immediately after calling start().
You cannot determine the execution order of run() among different threads.
For a single thread, it is guaranteed that the statements in the run() method are executed in order.
This is because the url in the thread will be requested first, and then the returned result will be printed.
Example 2
We will use a program to demonstrate resource competition between multi-threads and fix this problem.
from threading import Thread #define a global variable some_var=0 classIncrementThread(Thread): defrun(self): #we want to read a global variable #and then increment it globalsome_var read_value=some_var print”some_var in %s is %d”%(self.name, read_value) some_var=read_value+1 print”some_var in %s after increment is %d”%(self.name, some_var) defuse_increment_thread(): threads=[] foriinrange(50): t=IncrementThread() threads.append(t) t.start() fortinthreads: t.join() print”After 50 modifications, some_var should have become 50″ print”After 50 modifications, some_var is %d”%(some_var,) use_increment_thread()
Run this program multiple times and you will see a variety of different results.
Explanation:
There is a global variable and all threads want to modify it.
All threads should add this global variable
1
.
With 50 threads, the final value should become 50, but it does not.
Why didn’t it reach 50?
When some_var is 15, thread t1 reads some_var. At this time, the CPU gives control to another thread t2.
The some_var read by the t2 thread is also 15
Both t1 and t2 increase some_var to 16
What we expected at that time was t1
t2 two threads make some_var +
2 becomes 17
There is competition for resources here.
The same situation may also occur in other threads, so the final result may be less than 50.
Resolving resource competition
from threading import Lock, Thread lock=Lock() some_var=0 classIncrementThread(Thread): defrun(self): #we want to read a global variable #and then increment it globalsome_var lock.acquire() read_value=some_var print”some_var in %s is %d”%(self.name, read_value) some_var=read_value+1 print”some_var in %s after increment is %d”%(self.name, some_var) lock.release() defuse_increment_thread(): threads=[] foriinrange(50): t=IncrementThread() threads.append(t) t.start() fortinthreads: t.join() print”After 50 modifications, some_var should have become 50″ print”After 50 modifications, some_var is %d”%(some_var,) use_increment_thread()
Running the program again achieved the results we expected.
Explanation:
Lock
Used to prevent race conditions
If thread t1 acquires the lock before performing some operations. Other threads will not perform the same operation before t1 releases the Lock
What we want to make sure is that once thread t1 has read some_var, other threads can not read some_var until t1 has finished modifying some_var
Read like this And modifying some_var becomes a logical atomic operation.
Example 3
Let us use an example to prove that one thread cannot affect variables (non-global variables) in other threads.
time.sleep() can suspend a thread and force thread switching to occur.
from threading import Thread import time classCreateListThread(Thread): defrun(self): self.entries=[] foriinrange(10): time.sleep(1) self.entries.append(i) printself.entries defuse_create_list_thread(): foriinrange(3): t=CreateListThread() t.start() use_create_list_thread()
After running it a few times, I found that the results I was striving for were not printed out. While one thread is printing, the CPU switches to another thread, so incorrect results are produced. We need to make sure to print
self.entries is a logical atomic operation to prevent printing from being interrupted by other threads.
We used Lock(), look at the example below.
from threading import Thread, Lock import time lock=Lock() classCreateListThread(Thread): defrun(self): self.entries=[] foriinrange(10): time.sleep(1) self.entries.append(i) lock.acquire() printself.entries lock.release() defuse_create_list_thread(): foriinrange(3): t=CreateListThread() t.start() use_create_list_thread()
This time we saw the correct results. It proves that one thread cannot modify the internal variables (non-global variables) of other threads.