Home  >  Article  >  Backend Development  >  How to Share Large Readonly Data Efficiently in Python Multiprocessing?

How to Share Large Readonly Data Efficiently in Python Multiprocessing?

Linda Hamilton
Linda HamiltonOriginal
2024-10-24 18:45:50705browse

How to Share Large Readonly Data Efficiently in Python Multiprocessing?

Maintaining Shared Readonly Data in Multiprocessing

Question:

In a Python multiprocessing environment, how to ensure that a sizeable readonly array (e.g., 3 Gb) is shared among multiple processes without creating copies?

Answer:

Utilizing shared memory capabilities provided by the multiprocessing module in conjunction with NumPy allows for efficient sharing of data between processes.

<code class="python">import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)</code>

This approach leverages the fact that Linux employs copy-on-write semantics for fork(), ensuring that data is only duplicated when modified. As a result, even without explicitly using the multiprocessing.Array, the data is effectively shared between processes unless altered.

<code class="python"># Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = i

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.map(my_func, range(10))

    print(shared_array)</code>

This code concurrently modifies the shared array and demonstrates the successful sharing of data among multiple processes:

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
 [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]
 [ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.]
 [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]
 [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
 [ 7.  7.  7.  7.  7.  7.  7.  7.  7.  7.]
 [ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]
 [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]]

By leveraging shared memory and copy-on-write semantics, this approach provides an efficient solution for sharing large amounts of readonly data between processes in a multiprocessing environment.

The above is the detailed content of How to Share Large Readonly Data Efficiently in Python Multiprocessing?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn