Home  >  Article  >  Backend Development  >  Summary of Python multi-process knowledge points

Summary of Python multi-process knowledge points

WBOY
WBOYforward
2022-05-17 17:49:142308browse

This article brings you relevant knowledge about python, which mainly introduces the relevant content about multi-process, including what is multi-process, process creation, inter-process synchronization, and process Chi and so on, let’s take a look at it together, I hope it will be helpful to everyone.

Summary of Python multi-process knowledge points

Recommended learning: python video tutorial

1. What is multi-process?

1. Process

Program: For example, xxx.py is a program, which is a static

Process: After a program is run, the resources used by the code are called processes. It is the basic unit for the operating system to allocate resources. Not only can multitasking be completed through threads, but also processes can be done

2. Process status

During work, the number of tasks is often greater than the number of CPU cores, that is, there must be some tasks being executed, and Some other tasks are waiting for the CPU to execute, resulting in different states
Summary of Python multi-process knowledge points

  • Ready state: The running conditions have slowed down and are in progress Waiting for cpu execution
  • Execution state: The CPU is executing its function
  • Waiting state: Waiting for certain conditions to be met, such as a program sleeping , it is in the waiting state at this time

2. Creation of process-multiprocessing

1. Process class syntax description

multiprocessing The module generates a process by creating a Process object and then calling its start() method, Process is the same as threading.Thread API .

Syntax formatmultiprocessing.Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon =None)

Parameter description:

  • group: Specifies the process group, which is not used in most cases
  • target: If a function reference is passed, the child process can be tasked to execute the code here
  • name: Set a name for the process , you can not set
  • args: The parameters passed to the function specified by the target are passed in the form of tuples
  • kwargs: To the target The specified function passes named parameters

The multiprocessing.Process object has the following methods and properties:

The exit code of the child processThe daemon flag of the process is a Boolean value. The authentication key for the process. Numeric handle to the system object that will become ready when the process ends. Immediately terminate the child process regardless of whether the task is completedSame as terminate(), but uses the SIGKILL signal on Unix. Close the Process object and release all resources associated with it

2. Execute 2 while loops together

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef run_proc():
    """子进程要执行的代码"""
    while True:
        print("----2----")
        time.sleep(1)if __name__=='__main__':
    p = Process(target=run_proc)
    p.start()
    while True:
        print("----1----")
        time.sleep(1)

Running results:
Summary of Python multi-process knowledge points
Description: When creating a child process, only one execution function needs to be passed in and function parameters, create a Process instance, use the start() method to start

3. Process pid

# -*- coding:utf-8 -*-from multiprocessing import Processimport osimport timedef run_proc():
    """子进程要执行的代码"""
    print('子进程运行中,pid=%d...' % os.getpid())  # os.getpid获取当前进程的进程号
    print('子进程将要结束...')if __name__ == '__main__':
    print('父进程pid: %d' % os.getpid())  # os.getpid获取当前进程的进程号
    p = Process(target=run_proc)
    p.start()

Running result:
Summary of Python multi-process knowledge points

4. Pass parameters to the function specified by the child process

# -*- coding:utf-8 -*-from multiprocessing import Processimport osfrom time import sleepdef run_proc(name, age, **kwargs):
    for i in range(10):
        print('子进程运行中,name= %s,age=%d ,pid=%d...' % (name, age, os.getpid()))
        print(kwargs)
        sleep(0.2)if __name__=='__main__':
    p = Process(target=run_proc, args=('test',18), kwargs={"m":20})
    p.start()
    sleep(1)  # 1秒中之后,立即结束子进程
    p.terminate()
    p.join()

Running result:
Summary of Python multi-process knowledge points

5. Global variables are not shared between processes

# -*- coding:utf-8 -*-from multiprocessing import Processimport osimport time

nums = [11, 22]def work1():
    """子进程要执行的代码"""
    print("in process1 pid=%d ,nums=%s" % (os.getpid(), nums))
    for i in range(3):
        nums.append(i)
        time.sleep(1)
        print("in process1 pid=%d ,nums=%s" % (os.getpid(), nums))def work2():
    """子进程要执行的代码"""
    print("in process2 pid=%d ,nums=%s" % (os.getpid(), nums))if __name__ == '__main__':
    p1 = Process(target=work1)
    p1.start()
    p1.join()

    p2 = Process(target=work2)
    p2.start()

Running result:

in process1 pid=11349 ,nums=[11, 22]in process1 pid=11349 ,nums=[11, 22, 0]in process1 pid=11349 ,
nums=[11, 22, 0, 1]in process1 pid=11349 ,nums=[11, 22, 0, 1, 2]in process2 pid=11350 ,nums=[11, 22]

3. Inter-process synchronization-Queue

Sometimes communication is required between processes. The operating system provides many mechanisms to realize inter-process communication. Communication.

1. Queue class syntax description

Method name/property Explanation
run() The specific execution method of the process
start() Start a child process instance (create a child process)
join([timeout ]) If the optional parameter timeout is the default value None, it will block until the process calling the join() method terminates; if timeout is a positive number, it will block for up to timeout seconds
name The alias of the current process, the default is Process-N, N is an integer increasing from 1
pid The pid (process number) of the current process
is_alive() Determine whether the child process of the process is still alive
##exitcode
daemon
authkey
sentinel
terminate()
kill()
close()
Quite Queue.get(False)## Queue.put_nowait(item)

2. Queue的使用

可以使用multiprocessing模块的Queue实现多进程之间的数据传递,Queue本身是一个消息列队程序,首先用一个小实例来演示一下Queue的工作原理:

#coding=utf-8from multiprocessing import Queue
q=Queue(3) #初始化一个Queue对象,最多可接收三条put消息q.put("消息1") q.put("消息2")print(q.full())  #Falseq.put("消息3")print(q.full()) #True#因为消息列队已满下面的try都会抛出异常,第一个try会等待2秒后再抛出异常,第二个Try会立刻抛出异常try:
    q.put("消息4",True,2)except:
    print("消息列队已满,现有消息数量:%s"%q.qsize())try:
    q.put_nowait("消息4")except:
    print("消息列队已满,现有消息数量:%s"%q.qsize())#推荐的方式,先判断消息列队是否已满,再写入if not q.full():
    q.put_nowait("消息4")#读取消息时,先判断消息列队是否为空,再读取if not q.empty():
    for i in range(q.qsize()):
        print(q.get_nowait())

运行结果:

FalseTrue消息列队已满,现有消息数量:3消息列队已满,现有消息数量:3消息1消息2消息3

3. Queue实例

我们以Queue为例,在父进程中创建两个子进程,一个往Queue里写数据,一个从Queue里读数据:

from multiprocessing import Process, Queueimport os, time, random# 写数据进程执行的代码:def write(q):
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())# 读数据进程执行的代码:def read(q):
    while True:
        if not q.empty():
            value = q.get(True)
            print('Get %s from queue.' % value)
            time.sleep(random.random())
        else:
            breakif __name__=='__main__':
    # 父进程创建Queue,并传给各个子进程:
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw,写入:
    pw.start()    
    # 等待pw结束:
    pw.join()
    # 启动子进程pr,读取:
    pr.start()
    pr.join()
    # pr进程里是死循环,无法等待其结束,只能强行终止:
    print('')
    print('所有数据都写入并且读完')

运行结果:
Summary of Python multi-process knowledge points

四、进程间同步-Lock

锁是为了确保数据一致性。比如读写锁,每个进程给一个变量增加 1,但是如果在一个进程读取但还没有写入的时候,另外的进程也同时读取了,并写入该值,则最后写入的值是错误的,这时候就需要加锁来保持数据一致性。

通过使用Lock来控制一段代码在同一时间只能被一个进程执行。Lock对象的两个方法,acquire()用来获取锁,release()用来释放锁。当一个进程调用acquire()时,如果锁的状态为unlocked,那么会立即修改为locked并返回,这时该进程即获得了锁。如果锁的状态为locked,那么调用acquire()的进程则阻塞。

1. Lock的语法说明

  • lock = multiprocessing.Lock(): 创建一个锁

  • lock.acquire() :获取锁

  • lock.release() :释放锁

  • with lock:自动获取、释放锁 类似于 with open() as f:

2. 程序不加锁时

import multiprocessingimport timedef add(num, value):
    print('add{0}:num={1}'.format(value, num))
    for i in range(0, 2):
        num += value        print('add{0}:num={1}'.format(value, num))
        time.sleep(1)if __name__ == '__main__':
    lock = multiprocessing.Lock()
    num = 0
    p1 = multiprocessing.Process(target=add, args=(num, 1))
    p2 = multiprocessing.Process(target=add, args=(num, 2))
    p1.start()
    p2.start()

运行结果:运得没有顺序,两个进程交替运行

add1:num=0add1:num=1add2:num=0add2:num=2add1:num=2add2:num=4

3. 程序加锁时

import multiprocessingimport timedef add(num, value, lock):
    try:
        lock.acquire()
        print('add{0}:num={1}'.format(value, num))
        for i in range(0, 2):
            num += value            print('add{0}:num={1}'.format(value, num))
            time.sleep(1)
    except Exception as err:
        raise err    finally:
        lock.release()if __name__ == '__main__':
    lock = multiprocessing.Lock()
    num = 0
    p1 = multiprocessing.Process(target=add, args=(num, 1, lock))
    p2 = multiprocessing.Process(target=add, args=(num, 2, lock))
    p1.start()
    p2.start()

运行结果:只有当其中一个进程执行完成后,其它的进程才会去执行,且谁先抢到锁谁先执行

add1:num=0add1:num=1add1:num=2add2:num=0add2:num=2add2:num=4

五、进程池Pool

当需要创建的子进程数量不多时,可以直接利用multiprocessing中的Process动态成生多个进程,但如果是上百甚至上千个目标,手动的去创建进程的工作量巨大,此时就可以用到multiprocessing模块提供的Pool方法。

1. Pool类语法说明

语法格式multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]])

参数说明

  • processes:工作进程数目,如果 processes 为 None,则使用 os.cpu_count() 返回的值。

  • initializer:如果 initializer 不为 None,则每个工作进程将会在启动时调用 initializer(*initargs)。

  • maxtasksperchild:一个工作进程在它退出或被一个新的工作进程代替之前能完成的任务数量,为了释放未使用的资源。

  • context:用于指定启动的工作进程的上下文。

两种方式向进程池提交任务

  • apply(func[, args[, kwds]]):阻塞方式。

  • apply_async(func[, args[, kwds]]):非阻塞方式。使用非阻塞方式调用func(并行执行,堵塞方式必须等待上一个进程退出才能执行下一个进程),args为传递给func的参数列表,kwds为传递给func的关键字参数列表

multiprocessing.Pool常用函数:

Method name Description
q=Queue() Initialize the Queue() object. If the maximum number of messages that can be received is not specified in the brackets, or the number is a negative value, then Represents that there is no upper limit on the number of messages that can be accepted (until the end of memory)
Queue.qsize() Returns the number of messages contained in the current queue
Queue.empty() If the queue is empty, return True, otherwise False
Queue.full() If the queue is full, return True, otherwise False
Queue.get([block[ , timeout]]) Get a message in the queue and then remove it from the queue. The default value of block is True. 1. If the block uses the default value and no timeout (in seconds) is set, and the message queue is empty, the program will be blocked (stopped in the reading state) until the message is read from the message queue. If timeout is set , it will wait for timeout seconds, and if no message has been read, a "Queue.Empty" exception will be thrown. 2. If the block value is False and the message queue is empty, the "Queue.Empty" exception will be thrown immediately
## Queue.get_nowait()
Queue.put(item,[block[, timeout]]) Write item messages to the queue, the default value of block is True. 1. If the block uses the default value and no timeout (in seconds) is set, if there is no space for writing in the message queue, the program will be blocked (stopped in the writing state) until space is made available in the message queue. If timeout is set, it will wait for timeout seconds. If there is no space, a "Queue.Full" exception will be thrown. 2. If the block value is False, if there is no space to write in the message queue, the "Queue.Full" exception will be thrown immediately
Quite Queue.put(item, False)
方法名 说明
close() 关闭Pool,使其不再接受新的任务
terminate() 不管任务是否完成,立即终止
join() 主进程阻塞,等待子进程的退出, 必须在close或terminate之后使用

2. Pool实例

初始化Pool时,可以指定一个最大进程数,当有新的请求提交到Pool中时,如果池还没有满,那么就会创建一个新的进程用来执行该请求;但如果池中的进程数已经达到指定的最大值,那么该请求就会等待,直到池中有进程结束,才会用之前的进程来执行新的任务,请看下面的实例:

# -*- coding:utf-8 -*-from multiprocessing import Poolimport os, time, randomdef worker(msg):
    t_start = time.time()
    print("%s开始执行,进程号为%d" % (msg,os.getpid()))
    # random.random()随机生成0~1之间的浮点数
    time.sleep(random.random()*2) 
    t_stop = time.time()
    print(msg,"执行完毕,耗时%0.2f" % (t_stop-t_start))po = Pool(3)  # 定义一个进程池,最大进程数3for i in range(0,10):
    # Pool().apply_async(要调用的目标,(传递给目标的参数元祖,))
    # 每次循环将会用空闲出来的子进程去调用目标
    po.apply_async(worker,(i,))print("----start----")po.close()  
    # 关闭进程池,关闭后po不再接收新的请求po.join()  
    # 等待po中所有子进程执行完成,必须放在close语句之后print("-----end-----")

运行结果:

----start----
0开始执行,进程号为21466
1开始执行,进程号为21468
2开始执行,进程号为21467
0 执行完毕,耗时1.01
3开始执行,进程号为21466
2 执行完毕,耗时1.24
4开始执行,进程号为21467
3 执行完毕,耗时0.56
5开始执行,进程号为21466
1 执行完毕,耗时1.68
6开始执行,进程号为21468
4 执行完毕,耗时0.67
7开始执行,进程号为21467
5 执行完毕,耗时0.83
8开始执行,进程号为21466
6 执行完毕,耗时0.75
9开始执行,进程号为21468
7 执行完毕,耗时1.03
8 执行完毕,耗时1.05
9 执行完毕,耗时1.69
-----end-----

3. 进程池中的Queue

如果要使用Pool创建进程,就需要使用multiprocessing.Manager()中的Queue()

而不是multiprocessing.Queue(),否则会得到一条如下的错误信息:RuntimeError: Queue objects should only be shared between processes through inheritance.

下面的实例演示了进程池中的进程如何通信:

# -*- coding:utf-8 -*-# 修改import中的Queue为Managerfrom multiprocessing import Manager,Poolimport os,time,randomdef reader(q):
    print("reader启动(%s),父进程为(%s)" % (os.getpid(), os.getppid()))
    for i in range(q.qsize()):
        print("reader从Queue获取到消息:%s" % q.get(True))def writer(q):
    print("writer启动(%s),父进程为(%s)" % (os.getpid(), os.getppid()))
    for i in "itcast":
        q.put(i)if __name__=="__main__":
    print("(%s) start" % os.getpid())
    q = Manager().Queue()  # 使用Manager中的Queue
    po = Pool()
    po.apply_async(writer, (q,))

    time.sleep(1)  # 先让上面的任务向Queue存入数据,然后再让下面的任务开始从中取数据

    po.apply_async(reader, (q,))
    po.close()
    po.join()
    print("(%s) End" % os.getpid())

运行结果:

(11095) start
writer启动(11097),父进程为(11095)reader启动(11098),父进程为(11095)reader从Queue获取到消息:i
reader从Queue获取到消息:t
reader从Queue获取到消息:c
reader从Queue获取到消息:a
reader从Queue获取到消息:s
reader从Queue获取到消息:t(11095) End

六、进程、线程对比

1. 功能

进程:能够完成多任务,比如 在一台电脑上能够同时运行多个QQ
线程:能够完成多任务,比如 一个QQ中的多个聊天窗口

定义的不同

  • 进程是系统进行资源分配和调度的一个独立单位.

  • 线程是进程的一个实体,是CPU调度和分派的基本单位,它是比进程更小的能独立运行的基本单位.线程自己基本上不拥有系统资源,只拥有一点在运行中必不可少的资源(如程序计数器,一组寄存器和栈),但是它可与同属一个进程的其他的线程共享进程所拥有的全部资源.

2. 区别

  • 一个程序至少有一个进程,一个进程至少有一个线程.
    -线程的划分尺度小于进程(资源比进程少),使得多线程程序的并发性高。
    -进程在执行过程中拥有独立的内存单元,而多个线程共享内存,从而极大地提高了程序的运行效率
    Summary of Python multi-process knowledge points
  • 线线程不能够独立执行,必须依存在进程中
  • 可以将进程理解为工厂中的一条流水线,而其中的线程就是这个流水线上的工人

3. 优缺点

  • 线程:线程执行开销小,但不利于资源的管理和保护
  • 进程:进程执行开销大,但利于资源的管理和保护

推荐学习:python视频教程

The above is the detailed content of Summary of Python multi-process knowledge points. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete