Concurrency is a crucial idea in modern programming that allows multiple tasks to run at the same time to improve the performance of applications.
There are several ways to achieve concurrency in Python, with threading and multiprocessing being the most well-known.
In this article, we'll explore these two methods in detail, understand how they work, and discuss when to use each, along with practical code examples.
What is Concurrency?
Before we talk about threading and multiprocessing, it’s important to understand what concurrency means.
Concurrency is when a program can do multiple tasks or processes at the same time.
This can make the program use resources better and run faster, especially when it needs to do things like reading files or doing lots of calculations.
There are two main ways to achieve concurrency:
- Parallelism: Running multiple tasks at the exact same time on different parts of the computer’s processor.
- Concurrency: Handling multiple tasks during the same time period, but not necessarily at the exact same moment.
Python offers two main ways to achieve concurrency:
- Threading: For tasks that can be managed at the same time.
- Multiprocessing: For tasks that need to run truly simultaneously on different processor cores.
Threading in Python
Threading allows you to run multiple smaller units of a process, called threads, within the same process, sharing the same memory space.
Threads are lighter than processes, and switching between them is faster.
However, threading in Python is subject to the Global Interpreter Lock (GIL), which ensures only one thread can execute Python code at a time.
How Threading Works
Python's threading module provides a simple and flexible way to create and manage threads.
Let’s start with a basic example:
import threading import time def print_numbers(): for i in range(5): print(f"Number: {i}") time.sleep(1) # Creating a thread thread = threading.Thread(target=print_numbers) # Starting the thread thread.start() # Wait for the thread to complete thread.join() print("Thread has finished executing") # Output: # Number: 0 # Number: 1 # Number: 2 # Number: 3 # Number: 4 # Thread has finished executing
In this example:
- We define a function print_numbers() that prints numbers from 0 to 4 with a one-second delay between prints.
- We create a thread using threading.Thread() and pass print_numbers() as the target function.
- The start() method begins the thread's execution, and join() ensures that the main program waits for the thread to finish before proceeding.
Example: Threading for I/O-Bound Tasks
Threading is especially useful for I/O-bound tasks, such as file operations, network requests, or database queries, where the program spends most of its time waiting for external resources.
Here’s an example that simulates downloading files using threads:
import threading import time def download_file(file_name): print(f"Starting download of {file_name}...") time.sleep(2) # Simulate download time print(f"Finished downloading {file_name}") files = ["file1.zip", "file2.zip", "file3.zip"] threads = [] # Create and start threads for file in files: thread = threading.Thread(target=download_file, args=(file,)) thread.start() threads.append(thread) # Ensure all threads have finished for thread in threads: thread.join() print("All files have been downloaded.") # Output: # Starting download of file1.zip... # Starting download of file2.zip... # Starting download of file3.zip... # Finished downloading file1.zip # Finished downloading file2.zip # Finished downloading file3.zip # All files have been downloaded.
By creating and managing separate threads for each file download, the program can handle multiple tasks simultaneously, improving overall efficiency.
The key steps in the code are as follows:
- A function download_file is defined to simulate the downloading process.
- A list of file names is created to represent the files that need to be downloaded.
- For each file in the list, a new thread is created with download_file as its target function. Each thread is started immediately after creation and added to a list of threads.
- The main program waits for all threads to finish using the join() method, ensuring that the program does not proceed until all downloads are complete.
Limitations of Threading
While threading can improve performance for I/O-bound tasks, it has limitations:
- Global Interpreter Lock (GIL): The GIL restricts execution to one thread at a time for CPU-bound tasks, limiting the effectiveness of threading in multi-core processors.
- Race Conditions: Since threads share the same memory space, improper synchronization can lead to race conditions, where the outcome of a program depends on the timing of threads.
- Deadlocks: Threads waiting on each other to release resources can lead to deadlocks, where no progress is made.
Multiprocessing in Python
Multiprocessing addresses the limitations of threading by using separate processes instead of threads.
Each process has its own memory space and Python interpreter, allowing true parallelism on multi-core systems.
This makes multiprocessing ideal for tasks that require heavy computation.
How Multiprocessing Works
The multiprocessing module in Python allows you to create and manage processes easily.
Let’s start with a basic example:
import multiprocessing import time def print_numbers(): for i in range(5): print(f"Number: {i}") time.sleep(1) if __name__ == "__main__": # Creating a process process = multiprocessing.Process(target=print_numbers) # Starting the process process.start() # Wait for the process to complete process.join() print("Process has finished executing") # Output: # Number: 0 # Number: 1 # Number: 2 # Number: 3 # Number: 4 # Process has finished executing
This example is similar to the threading example, but with processes.
Notice that the process creation and management are similar to threading, but because processes run in separate memory spaces, they are truly concurrent and can run on different CPU cores.
Example: Multiprocessing for CPU-Bound Tasks
Multiprocessing is particularly beneficial for tasks that are CPU-bound, such as numerical computations or data processing.
Here’s an example that calculates the square of numbers using multiple processes:
import multiprocessing def compute_square(number): return number * number if __name__ == "__main__": numbers = [1, 2, 3, 4, 5] # Create a pool of processes with multiprocessing.Pool() as pool: # Map function to numbers using multiple processes results = pool.map(compute_square, numbers) print("Squares:", results) # Output: # Squares: [1, 4, 9, 16, 25]
Here are the key steps in the code:
- A function compute_square is defined to take a number as input and return its square.
- The code within the if name == "main": block ensures that it runs only when the script is executed directly.
- A list of numbers is defined, which will be squared.
- A pool of worker processes is created using multiprocessing.Pool().
- The map method is used to apply the compute_square function to each number in the list, distributing the workload across multiple processes.
Inter-Process Communication (IPC)
Since each process has its own memory space, sharing data between processes requires inter-process communication (IPC) mechanisms.
The multiprocessing module provides several tools for IPC, such as Queue, Pipe, and Value.
Here’s an example using Queue to share data between processes:
import multiprocessing def worker(queue): # Retrieve and process data from the queue while not queue.empty(): item = queue.get() print(f"Processing {item}") if __name__ == "__main__": queue = multiprocessing.Queue() # Add items to the queue for i in range(10): queue.put(i) # Create a pool of processes to process the queue processes = [] for _ in range(4): process = multiprocessing.Process(target=worker, args=(queue,)) processes.append(process) process.start() # Wait for all processes to complete for process in processes: process.join() print("All processes have finished.") # Output: # Processing 0 # Processing 1 # Processing 2 # Processing 3 # Processing 4 # Processing 5 # Processing 6 # Processing 7 # Processing 8 # Processing 9 # All processes have finished.
In this example:
- def worker(queue): Defines a function worker that takes a queue as an argument. The function retrieves and processes items from the queue until it is empty.
- if name == "main":: Ensures that the following code runs only if the script is executed directly, not if it is imported as a module.
- queue = multiprocessing.Queue(): Creates a queue object for inter-process communication.
- for i in range(10): queue.put(i): Adds items (numbers 0 through 9) to the queue.
- processes = []: Initializes an empty list to store process objects.
- The for loop for _ in range(4): Creates four worker processes.
- process = multiprocessing.Process(target=worker, args=(queue,)): Creates a new process with worker as the target function and passes the queue as an argument.
- processes.append(process): Adds the process object to the processes list.
- process.start(): Starts the process.
- The for loop for process in processes: Waits for each process to complete using the join() method.
Challenges of Multiprocessing
While multiprocessing provides true parallelism, it comes with its own set of challenges:
- Higher Overhead: Creating and managing processes is more resource-intensive than threads due to separate memory spaces.
- Complexity: Communication and synchronization between processes are more complex than threading, requiring IPC mechanisms.
- Memory Usage: Each process has its own memory space, leading to higher memory usage compared to threading.
When to Use Threading vs. Multiprocessing
Choosing between threading and multiprocessing depends on the type of task you're dealing with:
Use Threading:
- For tasks that involve a lot of waiting, such as network operations or reading/writing files (I/O-bound tasks).
- When you need to share memory between tasks and can manage potential issues like race conditions.
- For lightweight concurrency without the extra overhead of creating multiple processes.
Use Multiprocessing:
- For tasks that require heavy computations or data processing (CPU-bound tasks) and can benefit from running on multiple CPU cores at the same time.
- When you need true parallelism and the Global Interpreter Lock (GIL) in threading becomes a limitation.
- For tasks that can run independently and don’t require frequent communication or shared memory.
Conclusion
Concurrency in Python is a powerful way to make your applications run faster.
Threading is great for tasks that involve a lot of waiting, like network operations or reading/writing files, but it's not as effective for tasks that require heavy computations because of something called the Global Interpreter Lock (GIL).
On the other hand, multiprocessing allows for true parallelism, making it perfect for CPU-intensive tasks, although it comes with higher overhead and complexity.
Whether you're processing data, handling multiple network requests, or doing complex calculations, Python's threading and multiprocessing tools give you what you need to make your program as efficient and fast as possible.
The above is the detailed content of Concurrency in Python with Threading and Multiprocessing. For more information, please follow other related articles on the PHP Chinese website!

This tutorial demonstrates how to use Python to process the statistical concept of Zipf's law and demonstrates the efficiency of Python's reading and sorting large text files when processing the law. You may be wondering what the term Zipf distribution means. To understand this term, we first need to define Zipf's law. Don't worry, I'll try to simplify the instructions. Zipf's Law Zipf's law simply means: in a large natural language corpus, the most frequently occurring words appear about twice as frequently as the second frequent words, three times as the third frequent words, four times as the fourth frequent words, and so on. Let's look at an example. If you look at the Brown corpus in American English, you will notice that the most frequent word is "th

This article explains how to use Beautiful Soup, a Python library, to parse HTML. It details common methods like find(), find_all(), select(), and get_text() for data extraction, handling of diverse HTML structures and errors, and alternatives (Sel

This article compares TensorFlow and PyTorch for deep learning. It details the steps involved: data preparation, model building, training, evaluation, and deployment. Key differences between the frameworks, particularly regarding computational grap

Serialization and deserialization of Python objects are key aspects of any non-trivial program. If you save something to a Python file, you do object serialization and deserialization if you read the configuration file, or if you respond to an HTTP request. In a sense, serialization and deserialization are the most boring things in the world. Who cares about all these formats and protocols? You want to persist or stream some Python objects and retrieve them in full at a later time. This is a great way to see the world on a conceptual level. However, on a practical level, the serialization scheme, format or protocol you choose may determine the speed, security, freedom of maintenance status, and other aspects of the program

Python's statistics module provides powerful data statistical analysis capabilities to help us quickly understand the overall characteristics of data, such as biostatistics and business analysis. Instead of looking at data points one by one, just look at statistics such as mean or variance to discover trends and features in the original data that may be ignored, and compare large datasets more easily and effectively. This tutorial will explain how to calculate the mean and measure the degree of dispersion of the dataset. Unless otherwise stated, all functions in this module support the calculation of the mean() function instead of simply summing the average. Floating point numbers can also be used. import random import statistics from fracti

In this tutorial you'll learn how to handle error conditions in Python from a whole system point of view. Error handling is a critical aspect of design, and it crosses from the lowest levels (sometimes the hardware) all the way to the end users. If y

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

This tutorial builds upon the previous introduction to Beautiful Soup, focusing on DOM manipulation beyond simple tree navigation. We'll explore efficient search methods and techniques for modifying HTML structure. One common DOM search method is ex


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
