How to Create a Basic Parallel Program in Python

Python's Global Interpreter Lock (GIL) limits standard threads to executing on a single CPU core, which creates a bottleneck for computationally intensive tasks. To bypass this and utilize all available CPU cores, Python provides the multiprocessing module. This allows you to spawn separate processes, each with its own Python interpreter and memory space.

This guide explains how to create basic processes, manage them using a pool, and handle the critical entry-point protection required for Windows and macOS.

Understanding the Process Model

Before writing code, it is important to understand the difference between Threading and Multiprocessing in Python:

Threading: Runs multiple threads within a single process. They share memory but are limited by the GIL. Best for I/O-bound tasks (network requests, file reading).
Multiprocessing: Spawns new processes. They have separate memory spaces and overhead but run in true parallel on different CPU cores. Best for CPU-bound tasks (calculations, data processing).

Method 1: Using the Process Class (Manual)

The most direct way to create a parallel task is using the multiprocessing.Process class. You define a target function, create a process object, start it, and wait for it to finish.

import multiprocessing
import time

def worker_task(name):
    """A simple task that simulates work."""
    print(f"Worker {name} starting...")
    time.sleep(1) # Simulate computation
    print(f"Worker {name} finished.")

if __name__ == "__main__":
    # 1. Create process objects
    p1 = multiprocessing.Process(target=worker_task, args=("A",))
    p2 = multiprocessing.Process(target=worker_task, args=("B",))

    # 2. Start the processes (This spawns them)
    p1.start()
    p2.start()

    print("Main process waiting...")

    # 3. Wait for processes to complete
    p1.join()
    p2.join()

    print("All tasks completed.")

Output:

Main process waiting...
Worker A starting...
Worker B starting...
Worker A finished.
Worker B finished.
All tasks completed.

note

The join() method blocks the main program until the specific process terminates. Without it, the main program might finish before the workers do.

Crucial Step: The Main Execution Guard

One of the most common errors in Python multiprocessing occurs when users forget to protect the entry point of the script.

Error

On Windows and macOS (which use the 'spawn' start method), the new process imports the main script file to run. If the code creating the process is not inside an if __name__ == "__main__": block, the new process will try to create another process recursively, leading to a crash or infinite loop.

import multiprocessing

def task():
    print("Working")

# ⛔️ Error: Missing the __main__ guard
# On Windows/macOS, this causes a RuntimeError or infinite recursion
p = multiprocessing.Process(target=task)
p.start()
p.join()

Output (Error):

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase...

Solution

Always wrap your process creation logic.

import multiprocessing

def task():
    print("Working")

# ✅ Correct: Protects the entry point
if __name__ == "__main__":
    p = multiprocessing.Process(target=task)
    p.start()
    p.join()

Method 2: Using a Process Pool (Recommended)

Manually creating and joining processes is inefficient if you have hundreds of tasks. The multiprocessing.Pool class manages a fixed number of worker processes (usually equal to your CPU count) and distributes tasks among them.

This is the standard approach for parallelizing a loop.

import multiprocessing

def square_number(x):
    return x * x

if __name__ == "__main__":
    data = [1, 2, 3, 4, 5]
    
    # Check available CPUs
    cpu_count = multiprocessing.cpu_count()
    print(f"Distributing work across {cpu_count} cores.")

    # ✅ Correct: Using a context manager to manage the pool
    with multiprocessing.Pool(processes=cpu_count) as pool:
        # Map inputs to the function in parallel
        # This is equivalent to list(map(square_number, data))
        results = pool.map(square_number, data)

    print(f"Results: {results}")

Output:

Distributing work across 8 cores.
Results: [1, 4, 9, 16, 25]

tip

Use pool.map() when the function is simple and order matters. Use pool.apply_async() if you need more control over individual task submission.

Because processes have separate memory, they cannot share global variables. If Process A changes a variable, Process B sees no change. To pass data between processes, use a multiprocessing.Queue.

import multiprocessing

def producer(queue):
    """Puts data into the queue."""
    items = ["Apple", "Banana", "Cherry"]
    for item in items:
        queue.put(item)
        print(f"Produced: {item}")
    queue.put(None) # Sentinel value to signal 'done'

def consumer(queue):
    """Gets data from the queue."""
    while True:
        item = queue.get()
        if item is None: # Check for sentinel
            break
        print(f"Consumed: {item}")

if __name__ == "__main__":
    # Create a thread/process-safe queue
    q = multiprocessing.Queue()

    p1 = multiprocessing.Process(target=producer, args=(q,))
    p2 = multiprocessing.Process(target=consumer, args=(q,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

Output:

Produced: Apple
Produced: Banana
Produced: Cherry
Consumed: Apple
Consumed: Banana
Consumed: Cherry

Conclusion

To create a basic multiprocessing program in Python:

Import the multiprocessing module.
Define your worker function (the task to run in parallel).
Protect your entry point using if __name__ == "__main__":.
Execute using Process() for simple tasks or Pool() for processing lists of data.
Communicate using Queue() if processes need to exchange data.

Understanding the Process Model​

Method 1: Using the Process Class (Manual)​

Crucial Step: The Main Execution Guard​

Error​

Solution​

Method 2: Using a Process Pool (Recommended)​

Sharing Data with Queues​

Conclusion​

Table of Contents

Understanding the Process Model

Method 1: Using the Process Class (Manual)

Crucial Step: The Main Execution Guard

Error

Solution

Method 2: Using a Process Pool (Recommended)

Sharing Data with Queues

Conclusion