Home >Backend Development >Python Tutorial >Process Management in Python: Fundamentals of Parallel Programming
Parallel programming is a programming model that allows a program to run multiple tasks simultaneously on multiple processors or cores. This model aims to use processor resources more efficiently, reduce processing time and increase performance.
To illustrate parallel programming with an image, we can imagine that we have a problem. Before we start parallel processing, we divide this problem into smaller sub-parts. We assume that these sub-parts are independent of each other and have no knowledge about each other. Each sub-problem is translated into smaller tasks or instructions. These tasks are organized in a way that is suitable for parallel work. For example, many instructions can be created to perform the same operation on a dataset. These tasks are then distributed to different processors. Each processor processes its assigned instructions independently and in parallel. This process significantly reduces the total processing time and allows us to use resources more efficiently.
Python offers several tools and modules for parallel programming.
**Multiprocessing
**It allows the program to take advantage of true parallelism by enabling it to run multiple processes at the same time. multiprocessing module overcomes the limitations of GIL (Global Interpreter Lock), allowing to achieve full performance on multi-core processors.
Global Interpreter Lock (GIL) is a mechanism used in the popular implementation of Python called CPython. GIL allows only one thread to execute Python bytecode at a time. This is a construct that limits true parallelism when multithreading is used in Python.
*Example Square and Cube Calculation
*
from multiprocessing import Process def print_square(numbers): for n in numbers: print(f"Square of {n} is {n * n}") def print_cube(numbers): for n in numbers: print(f"Cube of {n} is {n * n * n}") if __name__ == "__main__": numbers = [2, 3, 4, 5] # İşlemler (processes) oluşturma process1 = Process(target=print_square, args=(numbers,)) process2 = Process(target=print_cube, args=(numbers,)) # İşlemleri başlatma process1.start() process2.start() # İşlemlerin tamamlanmasını bekleme process1.join() process2.join()
Why We Need Multiprocessing We can explain the need for multiprocessing with the analogy of a cook and a kitchen. You can think of a cook cooking alone in a kitchen as a single-process program. We can liken it to multiprocessing when more than one cook works together in the same kitchen.
Single Process - Single Cook
There is only one cook in a kitchen. This cook will make three different dishes: a starter, a main course and a dessert. Each dish is made in turn:
He prepares and completes the starter.
He moves on to the main course and finishes it.
Finally, he makes the dessert.
The problem:
No matter how fast the cook is, he or she takes turns and this wastes time in the kitchen.
If three different dishes need to be cooked at the same time, the time will be longer.
Multiprocessing - Many Cooks
Now imagine that there are three cooks in the same kitchen. Each is preparing a different dish:
One cook makes the starter.
The second cook prepares the main course.
The third cook makes the dessert.
Advantage:
Three dishes are made at the same time, which significantly reduces the total time.
Each cook does its own work independently and is not affected by the others.
Sharing Data Between Processes in Python
In Python, it is possible to share data between different processes using the multiprocessing module. However, each process uses its own memory space. Therefore, special mechanisms are used to share data between processes.
from multiprocessing import Process def print_square(numbers): for n in numbers: print(f"Square of {n} is {n * n}") def print_cube(numbers): for n in numbers: print(f"Cube of {n} is {n * n * n}") if __name__ == "__main__": numbers = [2, 3, 4, 5] # İşlemler (processes) oluşturma process1 = Process(target=print_square, args=(numbers,)) process2 = Process(target=print_cube, args=(numbers,)) # İşlemleri başlatma process1.start() process2.start() # İşlemlerin tamamlanmasını bekleme process1.join() process2.join()
When we examine the code sample, we see that the result list is empty. The main reason for this is that the processes created with multiprocessing work in their own memory space, independent of the main process. Because of this independence, changes made in the child process are not directly reflected in the variables in the main process.
Python provides the following methods for sharing data:
**1. Shared Memory
**Value and Array objects are used to share data between operations.
Value: Shares a single data type (for example, a number).
Array: Used for sharing an array of data.
import multiprocessing result = [] def square_of_list(mylist): for num in mylist: result.append(num**2) return result mylist= [1,3,4,5] p1 = multiprocessing.Process(target=square_of_list,args=(mylist,)) p1.start() p1.join() print(result) # [] Boş Liste
**2. Queue
**It uses the FIFO (First In First Out) structure to transfer data between processes.
multiprocessing.Queue allows multiple processes to send and receive data.
from multiprocessing import Process, Value def increment(shared_value): for _ in range(1000): shared_value.value += 1 if __name__ == "__main__": shared_value = Value('i', 0) processes = [Process(target=increment, args=(shared_value,)) for _ in range(5)] for p in processes: p.start() for p in processes: p.join() print(f"Sonuç: {shared_value.value}")
**3. Pipe
**multiprocessing.Pipe provides two-way data transfer between two processes.
It can be used for both sending and receiving data.
from multiprocessing import Process, Queue def producer(queue): for i in range(5): queue.put(i) # Kuyruğa veri ekle print(f"Üretildi: {i}") def consumer(queue): while not queue.empty(): item = queue.get() print(f"Tüketildi: {item}") if __name__ == "__main__": queue = Queue() producer_process = Process(target=producer, args=(queue,)) consumer_process = Process(target=consumer, args=(queue,)) producer_process.start() producer_process.join() consumer_process.start() consumer_process.join()
*Padding Between Processes
*“Padding between processes” is often used for process memory organization or to avoid data alignment and collision issues when accessing data shared between multiple processes.
This concept is especially important in cases such as cache-line false sharing. False sharing can lead to performance loss when multiple processes try to use shared memory at the same time. This is due to the sharing of cache-lines in modern processors.
**Synchronization Between Processes
**With the multiprocessing module in Python, multiple processes can run simultaneously. However, it is important to use synchronization when multiple processes need to access the same data. This is necessary to ensure consistency of data and avoid issues such as race conditions.
from multiprocessing import Process, Pipe def send_data(conn): conn.send([1, 2, 3, 4]) conn.close() if __name__ == "__main__": parent_conn, child_conn = Pipe() process = Process(target=send_data, args=(child_conn,)) process.start() print(f"Alınan veri: {parent_conn.recv()}") # Veri al process.join()
Lock allows only one process to access shared data at a time.
Before the process using the lock finishes, other processes wait.
**Multithreading
Multithreading is a parallel programming model that allows a program to run multiple threads simultaneously. Threads are smaller independent units of code that run within the same process and aim for faster and more efficient processing by sharing resources.
In Python, the threading module is used to develop multithreading applications. However, due to Python's Global Interpreter Lock (GIL) mechanism, multithreading provides limited performance on CPU-bound tasks. Therefore, multithreading is generally preferred for I/O-bound tasks.
thread is the sequence of instructions in our program.
from multiprocessing import Process def print_square(numbers): for n in numbers: print(f"Square of {n} is {n * n}") def print_cube(numbers): for n in numbers: print(f"Cube of {n} is {n * n * n}") if __name__ == "__main__": numbers = [2, 3, 4, 5] # İşlemler (processes) oluşturma process1 = Process(target=print_square, args=(numbers,)) process2 = Process(target=print_cube, args=(numbers,)) # İşlemleri başlatma process1.start() process2.start() # İşlemlerin tamamlanmasını bekleme process1.join() process2.join()
**Thread Synchronization
**Thread synchronization is a technique used to ensure data consistency and order when multiple threads access the same resources simultaneously. In Python, the threading module provides several tools for synchronization.
**Why Need Thread Synchronization?
**Race Conditions:
When two or more threads access a shared resource at the same time, data inconsistencies can occur.
For example, one thread may read data while another thread updates the same data.
*Data Consistency:
*
Coordination between threads is required to ensure that shared resources are updated correctly.
Synchronization Tool Examples in Python
**1. Lock
**When a thread acquires the lock, it waits for the lock to be released before other threads can access the same resource.
import multiprocessing result = [] def square_of_list(mylist): for num in mylist: result.append(num**2) return result mylist= [1,3,4,5] p1 = multiprocessing.Process(target=square_of_list,args=(mylist,)) p1.start() p1.join() print(result) # [] Boş Liste
2-Event
from multiprocessing import Process, Value def increment(shared_value): for _ in range(1000): shared_value.value += 1 if __name__ == "__main__": shared_value = Value('i', 0) processes = [Process(target=increment, args=(shared_value,)) for _ in range(5)] for p in processes: p.start() for p in processes: p.join() print(f"Sonuç: {shared_value.value}")
**Conclusion:
**Thread synchronization is critical to prevent data inconsistencies when threads access shared resources. In Python, tools such as Lock, RLock, Semaphore, Event, and Condition provide effective solutions according to synchronization needs. Which tool to use depends on the needs of the application and synchronization requirements.
The above is the detailed content of Process Management in Python: Fundamentals of Parallel Programming. For more information, please follow other related articles on the PHP Chinese website!