Home  >  Article  >  Backend Development  >  What is Python multithreading and how to use it

What is Python multithreading and how to use it

2023-05-13 11:55:14825browse

What is a thread? Why do you want it?

At its core, Python is a linear language, but the threading module comes in handy when you need more processing power. Although threads in Python cannot be used for parallel CPU computing, they are well suited for I/O operations such as web scraping because the processor is idle, waiting for data.

Threads are changing the game as many network/data I/O related scripts spend most of their time waiting for data from remote sources. Since downloads may be unlinked (i.e. crawling separate websites), the processor can download from different data sources in parallel and merge the results at the end. For CPU-intensive processes, there is little benefit to using the thread module.

What is Python multithreading and how to use it

Fortunately, threads are included in the standard library:

import threading
from queue import Queue
import time

You can use target as a callable object, using args Pass arguments to the function and start start the thread.

def testThread(num):
    print num

if __name__ == '__main__':
    for i in range(5):
        t = threading.Thread(target=testThread, arg=(i,))

If you've never seen it before if __name__ == '__main__':, this is basically a way to ensure that the code nested within it is only run directly by the script (instead of method to run when importing).


Threads of the same operating system process distribute the computing workload across multiple cores, as shown in programming languages ​​​​such as C and Java. Normally, python only uses one process, from which a main thread is spawned to execute the runtime. Due to a locking mechanism called global interpreter lock (global interpreter lock), it remains on a single core regardless of how many cores the computer has or how many new threads are spawned. This mechanism is To prevent so-called race conditions.

What is Python multithreading and how to use it

When I think of competition, I think of NASCAR and Formula 1. Let's use this analogy and imagine all Formula 1 drivers trying to race in one car at the same time. Sounds ridiculous, right? , which is only possible if each driver has access to his or her own car, or better still, runs one lap at a time, handing the car off to the next driver each time.

This is very similar to what happens in threads. Threads are "forked" from the "main" thread, and each subsequent thread is a copy of the previous thread. These threads all exist in the same process "context" (event or race), so all resources (such as memory) allocated to the process are shared. For example, in a typical python interpreter session:

>>> a = 8

Here, a consumes very little memory (RAM) by letting some arbitrary location in memory temporarily hold the value 8.

So far so good, let's start some threads and observe their behavior when adding two numbers xy:

import time
import threading
from threading import Thread

a = 8

def threaded_add(x, y):
    # simulation of a more complex task by asking
    # python to sleep, since adding happens so quick!
    for i in range(2):
        global a
        print("computing task in a different thread!")
        #this is not okay! but python will force sync, more on that later!
        a = 10

# the current thread will be a subset fork!
if __name__ != "__main__":
    current_thread = threading.current_thread()

# here we tell python from the main 
# thread of execution make others
if __name__ == "__main__":

    thread = Thread(target = threaded_add, args = (1, 2))
    print("main thread finished...exiting")
>>> computing task in a different thread!
>>> 10
>>> computing task in a different thread!
>>> 10
>>> 10
>>> main thread finished...exiting

Two threads are currently running. Let's call them thread_one and thread_two. If thread_one wants to modify a with the value 10, and thread_two simultaneously tries to update the same variable, we have a problem! A condition called a data race will occur, and the resulting values ​​for a will be inconsistent.

A racing event you didn't watch, but heard two conflicting results from two of your friends! thread_one Let me tell you one thing, thread two refutes this! Here’s a pseudocode snippet to illustrate:

a = 8
# spawns two different threads 1 and 2
# thread_one updates the value of a to 10

if (a == 10):
  # a check

#thread_two updates the value of a to 15
a = 15
b = a * 2

# if thread_one finished first the result will be 20
# if thread_two finished first the result will be 30
# who is right?

What the hell is going on?

Python is an interpreted language, which means it comes with an interpreter - a program that parses its source code from another language! Some such interpreters in python include cpython, pypypy, Jpython and IronPython, among which, cpython is the original implementation of python.

CPython is an interpreter that provides external function interfaces with C and other programming languages. It compiles python source code into intermediate bytecode, which is interpreted by the CPython virtual machine. The discussion so far and in the future has been about CPython and understanding behavior in the environment.





import sys
import gc

hello = "world" #reference to 'world' is 2
print (sys.getrefcount(hello))

bye = "world" 
other_bye = bye 
>>> 4
>>> 6
>>> [['sys', 'gc', 'hello', 'world', 'print', 'sys', 'getrefcount', 'hello', 'bye', 'world', 'other_bye', 'bye', 'print', 'sys', 'getrefcount', 'bye', 'print', 'gc', 'get_referrers', 'other_bye'], (0, None, 'world'), {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.sourcefileloader>, '__spec__': None, '__annotations__': {}, '__builtins__': <module>, '__file__': 'test.py', '__cached__': None, 'sys': <module>, 'gc': <module>, 'hello': 'world', 'bye': 'world', 'other_bye': 'world'}]</module></module></module></_frozen_importlib_external.sourcefileloader>


CPython 的 GIL 通过一次允许一个线程控制解释器来控制 Python 解释器。它为单线程程序提供了性能提升,因为只需要管理一个锁,但代价是它阻止了多线程 CPython 程序在某些情况下充分利用多处理器系统。



import time, os
from threading import Thread, current_thread
from multiprocessing import current_process

COUNT = 200000000
SLEEP = 10

def io_bound(sec):
   pid = os.getpid()
   threadName = current_thread().name
   processName = current_process().name
   print(f"{pid} * {processName} * {threadName} \
           ---> Start sleeping...")
   print(f"{pid} * {processName} * {threadName} \
           ---> Finished sleeping...")

def cpu_bound(n):
   pid = os.getpid()
   threadName = current_thread().name
   processName = current_process().name
   print(f"{pid} * {processName} * {threadName} \
           ---> Start counting...")
   while n>0:
          n -= 1
   print(f"{pid} * {processName} * {threadName} \
       ---> Finished counting...")

 def timeit(function,args,threaded=False):
      start = time.time()
      if threaded:
         t1 = Thread(target = function, args =(args, ))
         t2 = Thread(target = function, args =(args, ))
      end = time.time()
      print('Time taken in seconds for running {} on Argument {} is {}s -{}'.format(function,args,end - start,"Threaded" if threaded else "None Threaded"))

if __name__=="__main__":
      #Running io_bound task

      #Running io_bound task in Thread

      #Running cpu_bound task

      #Running cpu_bound task in Thread
>>> 17244 * MainProcess * MainThread            ---> Start sleeping...
>>> 17244 * MainProcess * MainThread            ---> Finished sleeping...
>>> 17244 * MainProcess * MainThread            ---> Start sleeping...
>>> 17244 * MainProcess * MainThread            ---> Finished sleeping...
>>> Time taken in seconds for running <function> on Argument 10 is 20.036664724349976s -None Threaded
>>> 10180 * MainProcess * Thread-1            ---> Start sleeping...
>>> 10180 * MainProcess * Thread-2            ---> Start sleeping...
>>> 10180 * MainProcess * Thread-1            ---> Finished sleeping...
>>> 10180 * MainProcess * Thread-2            ---> Finished sleeping...
>>> Time taken in seconds for running <function> on Argument 10 is 10.01464056968689s -Threaded
>>> 14172 * MainProcess * MainThread            ---> Start counting...
>>> 14172 * MainProcess * MainThread        ---> Finished counting...
>>> 14172 * MainProcess * MainThread            ---> Start counting...
>>> 14172 * MainProcess * MainThread        ---> Finished counting...
>>> Time taken in seconds for running <function> on Argument 200000000 is 44.90199875831604s -None Threaded
>>> 15616 * MainProcess * Thread-1            ---> Start counting...
>>> 15616 * MainProcess * Thread-2            ---> Start counting...
>>> 15616 * MainProcess * Thread-1        ---> Finished counting...
>>> 15616 * MainProcess * Thread-2        ---> Finished counting...
>>> Time taken in seconds for running <function> on Argument 200000000 is 106.09711360931396s -Threaded</function></function></function></function>





import os
import time
from multiprocessing import Process, current_process

SLEEP = 10
COUNT = 200000000

def count_down(cnt):
   pid = os.getpid()
   processName = current_process().name
   print(f"{pid} * {processName} \
           ---> Start counting...")
   while cnt > 0:
       cnt -= 1

def io_bound(sec):
   pid = os.getpid()
   threadName = current_thread().name
   processName = current_process().name
   print(f"{pid} * {processName} * {threadName} \
           ---> Start sleeping...")
   print(f"{pid} * {processName} * {threadName} \
           ---> Finished sleeping...")

if __name__ == '__main__':
# creating processes
    start = time.time()

    p1 = Process(target=count_down, args=(COUNT, ))
    p2 = Process(target=count_down, args=(COUNT, ))

    #p1 = Process(target=, args=(SLEEP, ))
    #p2 = Process(target=count_down, args=(SLEEP, ))

  # starting process_thread

  # wait until finished

    stop = time.time()
    elapsed = stop - start

    print ("The time taken in seconds is :", elapsed)
>>> 1660 * Process-2            ---> Start counting...
>>> 10184 * Process-1            ---> Start counting...
>>> The time taken in seconds is : 12.815475225448608




import time
import asyncio

COUNT = 200000000

# asynchronous function defination
async def func_name(cnt):
       while cnt > 0:
           cnt -= 1

#asynchronous main function defination
async def main ():
  # Creating 2 tasks.....You could create as many tasks (n tasks)
  task1 = loop.create_task(func_name(COUNT))
  task2 = loop.create_task(func_name(COUNT))

  # await each task to execute before handing control back to the program
  await asyncio.wait([task1, task2])

if __name__ =='__main__':
  # get the event loop
  start_time = time.time()
  loop = asyncio.get_event_loop()
  # run all tasks in the event loop until completion
  print("--- %s seconds ---" % (time.time() - start_time))
>>> --- 41.74118399620056 seconds ---




  1. 数据在进程之间混洗会产生 I/O 开销

  2. 整个内存被复制到每个子进程中,这对于更重要的程序来说可能是很多开销

The above is the detailed content of What is Python multithreading and how to use it. For more information, please follow other related articles on the PHP Chinese website!

This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete