How to Create a Basic Multiprocessing Program in Python
Python's Global Interpreter Lock (GIL) often limits standard threads to a single CPU core, making them inefficient for CPU-intensive tasks like data processing or complex calculations. The multiprocessing module overcomes this by creating separate processes, each with its own memory space and Python interpreter, allowing you to fully utilize multi-core processors.
This guide explains how to build a basic multiprocessing program, handle inter-process communication, and avoid common pitfalls like recursive process spawning.
Understanding Multiprocessing
Before coding, it is crucial to distinguish between Multiprocessing and Multithreading:
- Multithreading: Runs multiple threads within a single process. They share memory. Best for I/O-bound tasks (waiting for network/disk).
- Multiprocessing: Runs multiple independent processes. They have separate memory spaces. Best for CPU-bound tasks (heavy calculation).
Step 1: Creating and Launching Processes
To create a parallel program, you define a target function and wrap it in a multiprocessing.Process object.
The Basic Workflow:
- Import
multiprocessing. - Define the worker function.
- Instantiate
Processobjects withtargetarguments. - Start the processes using
.start(). - Wait for them to finish using
.join().
import multiprocessing
import time
def worker_function(number):
"""A simple task that simulates computation."""
print(f"Worker {number}: Starting task...")
result = number * number
time.sleep(1) # Simulate work
print(f"Worker {number}: Result is {result}")
if __name__ == "__main__":
# ✅ Correct: Creating two separate processes
# 'target' is the function to run
# 'args' is a tuple of arguments for that function
process1 = multiprocessing.Process(target=worker_function, args=(2,))
process2 = multiprocessing.Process(target=worker_function, args=(3,))
# Start the processes (Parallel execution begins here)
process1.start()
process2.start()
# Wait for processes to complete before continuing main script
process1.join()
process2.join()
print("All processes finished.")
Output:
Worker 2: Starting task...
Worker 3: Starting task...
Worker 2: Result is 4
Worker 3: Result is 9
All processes finished.
The args parameter must be a tuple. If passing a single argument, remember the trailing comma: args=(2,).
Step 2: Inter-Process Communication (Queues)
Because processes have separate memory spaces, they cannot share global variables like threads do. If Process A modifies a global list, Process B will not see the change. To exchange data, use a Queue.
import multiprocessing
def producer(queue):
"""Adds items to the queue."""
print("Producer: Adding items...")
queue.put("Hello")
queue.put("World")
def consumer(queue):
"""Retrieves items from the queue."""
print("Consumer: Waiting for items...")
# .get() blocks until an item is available
item1 = queue.get()
item2 = queue.get()
print(f"Consumer got: {item1} {item2}")
if __name__ == "__main__":
# Create a thread-and-process-safe Queue
shared_queue = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(shared_queue,))
p2 = multiprocessing.Process(target=consumer, args=(shared_queue,))
p1.start()
p2.start()
p1.join()
p2.join()
Output:
Producer: Adding items...
Consumer: Waiting for items...
Consumer got: Hello World
Step 3: Optimizing Process Count
Spawning too many processes can slow down your system due to context-switching overhead. Ideally, the number of active processes should match the number of available CPU cores.
import multiprocessing
def get_optimal_count():
# ✅ Correct: Dynamically determine core count
try:
count = multiprocessing.cpu_count()
print(f"Available CPU cores: {count}")
return count
except NotImplementedError:
return 1 # Fallback
For heavy CPU tasks, usually processes = cpu_count(). For I/O tasks, you might benefit from cpu_count() * 2 or more.
Common Error: Missing the Entry Point Guard
On Windows and macOS, Python attempts to recursively import the main script to start new processes. If you do not protect the "start" logic, the program will enter an infinite loop of spawning processes until it crashes.
Infinite Recursion Error
import multiprocessing
def worker():
print("Working")
# ⛔️ Incorrect: This code runs immediately upon import/spawn
# On Windows, this causes a RuntimeError or infinite loop.
p = multiprocessing.Process(target=worker)
p.start()
p.join()
Solution: if __name__ == "__main__":
import multiprocessing
def worker():
print("Working")
# ✅ Correct: Ensures code only runs when script is executed directly
if __name__ == "__main__":
p = multiprocessing.Process(target=worker)
p.start()
p.join()
Conclusion
To create a basic multiprocessing program in Python:
- Use
multiprocessing.Processto spawn independent tasks. - Use
QueueorPipeto pass data between processes, as they do not share global variables. - Always use
if __name__ == "__main__":to prevent recursive spawning errors. - Use
.join()to ensure the main program waits for background tasks to complete.