How to Create a Basic Parallel Program in Python
Python's Global Interpreter Lock (GIL) limits standard threads to executing on a single CPU core, which creates a bottleneck for computationally intensive tasks. To bypass this and utilize all available CPU cores, Python provides the multiprocessing module. This allows you to spawn separate processes, each with its own Python interpreter and memory space.
This guide explains how to create basic processes, manage them using a pool, and handle the critical entry-point protection required for Windows and macOS.
Understanding the Process Model
Before writing code, it is important to understand the difference between Threading and Multiprocessing in Python:
- Threading: Runs multiple threads within a single process. They share memory but are limited by the GIL. Best for I/O-bound tasks (network requests, file reading).
- Multiprocessing: Spawns new processes. They have separate memory spaces and overhead but run in true parallel on different CPU cores. Best for CPU-bound tasks (calculations, data processing).
Method 1: Using the Process Class (Manual)
The most direct way to create a parallel task is using the multiprocessing.Process class. You define a target function, create a process object, start it, and wait for it to finish.
import multiprocessing
import time
def worker_task(name):
"""A simple task that simulates work."""
print(f"Worker {name} starting...")
time.sleep(1) # Simulate computation
print(f"Worker {name} finished.")
if __name__ == "__main__":
# 1. Create process objects
p1 = multiprocessing.Process(target=worker_task, args=("A",))
p2 = multiprocessing.Process(target=worker_task, args=("B",))
# 2. Start the processes (This spawns them)
p1.start()
p2.start()
print("Main process waiting...")
# 3. Wait for processes to complete
p1.join()
p2.join()
print("All tasks completed.")
Output:
Main process waiting...
Worker A starting...
Worker B starting...
Worker A finished.
Worker B finished.
All tasks completed.
The join() method blocks the main program until the specific process terminates. Without it, the main program might finish before the workers do.
Crucial Step: The Main Execution Guard
One of the most common errors in Python multiprocessing occurs when users forget to protect the entry point of the script.
Error
On Windows and macOS (which use the 'spawn' start method), the new process imports the main script file to run. If the code creating the process is not inside an if __name__ == "__main__": block, the new process will try to create another process recursively, leading to a crash or infinite loop.
import multiprocessing
def task():
print("Working")
# ⛔️ Error: Missing the __main__ guard
# On Windows/macOS, this causes a RuntimeError or infinite recursion
p = multiprocessing.Process(target=task)
p.start()
p.join()
Output (Error):
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase...
Solution
Always wrap your process creation logic.
import multiprocessing
def task():
print("Working")
# ✅ Correct: Protects the entry point
if __name__ == "__main__":
p = multiprocessing.Process(target=task)
p.start()
p.join()
Method 2: Using a Process Pool (Recommended)
Manually creating and joining processes is inefficient if you have hundreds of tasks. The multiprocessing.Pool class manages a fixed number of worker processes (usually equal to your CPU count) and distributes tasks among them.
This is the standard approach for parallelizing a loop.
import multiprocessing
def square_number(x):
return x * x
if __name__ == "__main__":
data = [1, 2, 3, 4, 5]
# Check available CPUs
cpu_count = multiprocessing.cpu_count()
print(f"Distributing work across {cpu_count} cores.")
# ✅ Correct: Using a context manager to manage the pool
with multiprocessing.Pool(processes=cpu_count) as pool:
# Map inputs to the function in parallel
# This is equivalent to list(map(square_number, data))
results = pool.map(square_number, data)
print(f"Results: {results}")
Output:
Distributing work across 8 cores.
Results: [1, 4, 9, 16, 25]
Use pool.map() when the function is simple and order matters. Use pool.apply_async() if you need more control over individual task submission.
Sharing Data with Queues
Because processes have separate memory, they cannot share global variables. If Process A changes a variable, Process B sees no change. To pass data between processes, use a multiprocessing.Queue.
import multiprocessing
def producer(queue):
"""Puts data into the queue."""
items = ["Apple", "Banana", "Cherry"]
for item in items:
queue.put(item)
print(f"Produced: {item}")
queue.put(None) # Sentinel value to signal 'done'
def consumer(queue):
"""Gets data from the queue."""
while True:
item = queue.get()
if item is None: # Check for sentinel
break
print(f"Consumed: {item}")
if __name__ == "__main__":
# Create a thread/process-safe queue
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(q,))
p2 = multiprocessing.Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
Output:
Produced: Apple
Produced: Banana
Produced: Cherry
Consumed: Apple
Consumed: Banana
Consumed: Cherry
Conclusion
To create a basic multiprocessing program in Python:
- Import the
multiprocessingmodule. - Define your worker function (the task to run in parallel).
- Protect your entry point using
if __name__ == "__main__":. - Execute using
Process()for simple tasks orPool()for processing lists of data. - Communicate using
Queue()if processes need to exchange data.