Home  >  Article  >  Backend Development  >  Is multi-processing or multi-threading faster in python?

Is multi-processing or multi-threading faster in python?

零下一度
零下一度Original
2017-06-01 10:04:493049browse

The following editor will bring you an article on who is faster between python multi-process and multi-thread (detailed explanation). The editor thinks it’s pretty good, so I’ll share it with you now and give it as a reference. Let’s follow the editor and take a look.

Is multi-processing or multi-threading faster in python?

python3.6

threading and multiprocessing

##Quad-core + Samsung 250G-850-SSD

Since used For multi-process and multi-thread programming, I still don’t understand which one is faster. Many people on the Internet say that Python multi-process is faster because of GIL (Global Interpreter Lock). But when I write code, the test time is faster with multi-threading, so what is going on? Recently I have been doing word segmentation work again. The original code is too slow and I want to speed it up, so let’s explore effective methods (there are codes and renderings at the end of the article)

Here is the result of the program. Diagram to illustrate which thread or process is faster

Some definitions

Parallelism means two or more events occur at the same time. Concurrency means that two or more events occur within the same time interval.

Threads are the smallest units that the operating system can perform computing scheduling. It is included in the process and is the actual operating unit in the process. An execution instance of a program is a process.

Implementation process

The multi-threads in python obviously have to get the GIL, execute the code, and finally release the GIL. So because of GIL, you can't get it when using multiple threads. In fact, it is a concurrent implementation, that is, multiple events occur at the same time interval.

But the process has an independent GIL, so it can be implemented in parallel. Therefore, for multi-core CPUs, theoretically using multiple processes can more effectively utilize resources.

Real-life problems

In online tutorials, you can often see python multi-threading. For example, web crawler tutorials and port scanning tutorials.

Take port scanning as an example. You can use multi-process to implement the following script, and you will find that python multi-process is faster. So isn’t it contrary to our analysis?

import sys,threading
from socket import *

host = "127.0.0.1" if len(sys.argv)==1 else sys.argv[1]
portList = [i for i in range(1,1000)]
scanList = []
lock = threading.Lock()
print('Please waiting... From ',host)


def scanPort(port):
  try:
    tcp = socket(AF_INET,SOCK_STREAM)
    tcp.connect((host,port))
  except:
    pass
  else:
    if lock.acquire():
      print('[+]port',port,'open')
      lock.release()
  finally:
    tcp.close()

for p in portList:
  t = threading.Thread(target=scanPort,args=(p,))
  scanList.append(t)
for i in range(len(portList)):
  scanList[i].start()
for i in range(len(portList)):
  scanList[i].join()

Who is faster

Because of the python lock problem, threads competing for locks and switching threads will consume resources. So, let’s take a bold guess:

In CPU-intensive tasks, multi-processing is faster or more effective; while in IO-intensive tasks, multi-threading can effectively improve efficiency.

Let’s take a look at the following code:

import time
import threading
import multiprocessing

max_process = 4
max_thread = max_process

def fun(n,n2):
  #cpu密集型
  for i in range(0,n):
    for j in range(0,(int)(n*n*n*n2)):
      t = i*j

def thread_main(n2):
  thread_list = []
  for i in range(0,max_thread):
    t = threading.Thread(target=fun,args=(50,n2))
    thread_list.append(t)

  start = time.time()
  print(' [+] much thread start')
  for i in thread_list:
    i.start()
  for i in thread_list:
    i.join()
  print(' [-] much thread use ',time.time()-start,'s')

def process_main(n2):
  p = multiprocessing.Pool(max_process)
  for i in range(0,max_process):
    p.apply_async(func = fun,args=(50,n2))
  start = time.time()
  print(' [+] much process start')
  p.close()#关闭进程池
  p.join()#等待所有子进程完毕
  print(' [-] much process use ',time.time()-start,'s')

if name=='main':
  print("[++]When n=50,n2=0.1:")
  thread_main(0.1)
  process_main(0.1)
  print("[++]When n=50,n2=1:")
  thread_main(1)
  process_main(1)
  print("[++]When n=50,n2=10:")
  thread_main(10)
  process_main(10)

The results are as follows:

It can be seen that when the When the CPU usage becomes higher and higher (the more code cycles there are), the gap becomes wider and wider. Verify our conjecture

CPU and IO intensive

1, CPU intensive code (various loop processing, counting, etc.)


2. IO-intensive code (file processing, web crawler, etc.)

Judgment method:

1. Directly look at the CPU usage, hard disk IO read and write speed


2. More calculations->CPU; more time waiting (such as web crawlers)->IO


3. Please Baidu

[Related recommendations]

1.

Examples of multi-process and multi-threading in Python (1)

2.

Recommended to use multi-process instead of multi-threading in Python? Share the reasons for recommending the use of multi-process

3.

Examples of multi-process and multi-threading in Python (2) Programming methods

4.

About Python Detailed introduction to processes, threads, and coroutines

5.

Python concurrent programming thread pool/process pool

The above is the detailed content of Is multi-processing or multi-threading faster in python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn