What is the difference between Multithreading and Multiprocessing in Python?

Understanding when to use multithreading versus multiprocessing is crucial for writing efficient concurrent Python code. The Global Interpreter Lock (GIL) fundamentally shapes this decision: use threads for I/O-bound tasks and processes for CPU-bound work.

Understanding the GIL

The Global Interpreter Lock is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously. This design choice simplifies memory management but limits true parallelism in CPU-bound tasks.

note

The GIL means only one thread executes Python code at a time
But threads CAN run concurrently during I/O operations

note

The GIL only affects CPython (the standard Python implementation). Alternative implementations like Jython or IronPython don't have this limitation.

Quick Comparison

Feature	Multithreading	Multiprocessing
GIL Impact	Limited by GIL	Bypasses GIL completely
Memory	Shared between threads	Separate per process
Best For	I/O-bound (network, files)	CPU-bound (calculations)
Overhead	Low (lightweight)	High (process creation)
Communication	Direct variable access	Queues, pipes, shared memory
Debugging	More complex (race conditions)	Easier isolation

Multithreading for I/O-Bound Tasks

When your code spends time waiting-for network responses, file operations, or database queries-threads excel because the GIL is released during these waiting periods.

Basic Threading Example

import threading
import time

def download(url):
    thread_name = threading.current_thread().name
    print(f"[{thread_name}] Starting download: {url}")
    time.sleep(2)  # Simulates network wait
    print(f"[{thread_name}] Completed: {url}")

# Create threads
t1 = threading.Thread(target=download, args=("file1.zip",), name="Thread-1")
t2 = threading.Thread(target=download, args=("file2.zip",), name="Thread-2")

# Start both threads
t1.start()
t2.start()

# Wait for completion
t1.join()
t2.join()

print("All downloads complete")

Output:

[Thread-1] Starting download: file1.zip
[Thread-2] Starting download: file2.zip
[Thread-1] Completed: file1.zip
[Thread-2] Completed: file2.zip
All downloads complete

Using ThreadPoolExecutor

For managing multiple threads cleanly, use the concurrent.futures module:

from concurrent.futures import ThreadPoolExecutor, as_completed
import time

def fetch_data(url):
    time.sleep(1)  # Simulate network delay
    return f"Data from {url}"

urls = [f"https://api.example.com/data/{i}" for i in range(5)]

with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit all tasks
    future_to_url = {executor.submit(fetch_data, url): url for url in urls}
    
    # Process results as they complete
    for future in as_completed(future_to_url):
        url = future_to_url[future]
        result = future.result()
        print(f"{url}: {result}")

Output:

https://api.example.com/data/0: Data from https://api.example.com/data/0
https://api.example.com/data/1: Data from https://api.example.com/data/1
https://api.example.com/data/2: Data from https://api.example.com/data/2
https://api.example.com/data/3: Data from https://api.example.com/data/3
https://api.example.com/data/4: Data from https://api.example.com/data/4

tip

ThreadPoolExecutor handles thread lifecycle automatically and limits concurrent threads to prevent resource exhaustion.

Multiprocessing for CPU-Bound Tasks

When your code performs heavy computations, multiprocessing bypasses the GIL by running separate Python interpreters, each with its own memory space.

Basic Multiprocessing Example

import multiprocessing
import time

def heavy_calculation(n):
    """CPU-intensive task."""
    result = sum(i * i for i in range(n))
    print(f"Process {multiprocessing.current_process().name}: Result = {result}")
    return result

if __name__ == "__main__":  # Required guard for Windows
    start = time.time()
    
    p1 = multiprocessing.Process(target=heavy_calculation, args=(10**7,))
    p2 = multiprocessing.Process(target=heavy_calculation, args=(10**7,))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    
    print(f"Total time: {time.time() - start:.2f}s")

Output:

Process Process-1: Result = 333333283333335000000
Process Process-2: Result = 333333283333335000000
Total time: 2.01s

warning

Always wrap multiprocessing code in if __name__ == "__main__": to prevent infinite process spawning on Windows and enable proper module imports.

Using ProcessPoolExecutor

The high-level interface simplifies process management:

from concurrent.futures import ProcessPoolExecutor
import time

def cpu_intensive_task(n):
    """Simulate heavy computation."""
    return sum(i ** 2 for i in range(n))

if __name__ == "__main__":
    numbers = [10**6, 10**7, 10**6, 10**7]
    
    start = time.time()
    
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(cpu_intensive_task, numbers))
    
    print(f"Results: {results}")
    print(f"Time: {time.time() - start:.2f}s")

Output:

Results: [333332833333500000, 333333283333335000000, 333332833333500000, 333333283333335000000]
Time: 2.71s

Since processes have separate memory, sharing data requires special mechanisms:

from multiprocessing import Process, Queue, Value, Array

def worker(queue, counter, shared_array):
    # Get data from queue
    item = queue.get()
    
    # Modify shared counter
    with counter.get_lock():
        counter.value += 1
    
    # Modify shared array
    for i in range(len(shared_array)):
        shared_array[i] *= 2

if __name__ == "__main__":
    # Create shared data structures
    queue = Queue()
    counter = Value('i', 0)  # 'i' = integer
    shared_array = Array('d', [1.0, 2.0, 3.0])  # 'd' = double
    
    queue.put("task data")
    
    p = Process(target=worker, args=(queue, counter, shared_array))
    p.start()
    p.join()
    
    print(f"Counter: {counter.value}")
    print(f"Array: {list(shared_array)}")

Output:

Counter: 1
Array: [2.0, 4.0, 6.0]

Performance Comparison

import time
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cpu_task(n):
    """CPU-bound: Calculate sum of squares."""
    return sum(i * i for i in range(n))

def io_task(seconds):
    """I/O-bound: Simulate waiting."""
    time.sleep(seconds)
    return seconds

def benchmark_cpu():
    """Compare threading vs multiprocessing for CPU work."""
    n = 10**7
    tasks = 4
    
    # Sequential
    start = time.time()
    for _ in range(tasks):
        cpu_task(n)
    sequential_time = time.time() - start
    
    # Threaded
    start = time.time()
    with ThreadPoolExecutor(max_workers=tasks) as executor:
        list(executor.map(cpu_task, [n] * tasks))
    threaded_time = time.time() - start
    
    # Multiprocessing
    start = time.time()
    with ProcessPoolExecutor(max_workers=tasks) as executor:
        list(executor.map(cpu_task, [n] * tasks))
    process_time = time.time() - start
    
    print("CPU-Bound Results:")
    print(f"  Sequential:      {sequential_time:.2f}s")
    print(f"  Threaded:        {threaded_time:.2f}s")
    print(f"  Multiprocessing: {process_time:.2f}s")

if __name__ == "__main__":
    benchmark_cpu()

Typical output:

CPU-Bound Results:
  Sequential:      4.12s
  Threaded:        4.08s  (GIL limits parallelism)
  Multiprocessing: 1.15s  (True parallelism)

Decision Guide

Task Type	Examples	Use
Web scraping	Fetching 1000 URLs	Threads
File downloads	Downloading multiple files	Threads
Database queries	Multiple concurrent queries	Threads
Image processing	Resizing 1000 images	Processes
Data analysis	Number crunching on large datasets	Processes
Machine learning	Training models	Processes
Mixed workloads	API calls + data processing	Both (threads for I/O, processes for CPU)

Common Patterns

Thread-Safe Counter

import threading

class ThreadSafeCounter:
    def __init__(self):
        self.value = 0
        self.lock = threading.Lock()
    
    def increment(self):
        with self.lock:
            self.value += 1

counter = ThreadSafeCounter()

def worker():
    for _ in range(1000):
        counter.increment()

threads = [threading.Thread(target=worker) for _ in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Final count: {counter.value}")  # 10000

Producer-Consumer with Queue

import threading
import queue
import time

def producer(q, items):
    for item in items:
        q.put(item)
        print(f"Produced: {item}")
        time.sleep(0.1)
    q.put(None)  # Sentinel to stop consumer

def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consumed: {item}")
        q.task_done()

q = queue.Queue()
items = ['task1', 'task2', 'task3', 'task4']

producer_thread = threading.Thread(target=producer, args=(q, items))
consumer_thread = threading.Thread(target=consumer, args=(q,))

producer_thread.start()
consumer_thread.start()

producer_thread.join()
consumer_thread.join()

Summary

The GIL dictates your concurrency strategy in Python.

Threads shine when your code waits for external resources-they share memory efficiently and have low overhead. - Processes unlock true parallelism for computation-heavy work by running separate Python interpreters.

Choose based on where your code spends its time: waiting means threads, computing means processes.

Understanding the GIL​

Quick Comparison​

Multithreading for I/O-Bound Tasks​

Basic Threading Example​

Using ThreadPoolExecutor​

Multiprocessing for CPU-Bound Tasks​

Basic Multiprocessing Example​

Using ProcessPoolExecutor​

Sharing Data Between Processes​

Performance Comparison​

Decision Guide​

Common Patterns​

Thread-Safe Counter​

Producer-Consumer with Queue​

Summary​

Table of Contents

Understanding the GIL

Quick Comparison

Multithreading for I/O-Bound Tasks

Basic Threading Example

Using ThreadPoolExecutor

Multiprocessing for CPU-Bound Tasks

Basic Multiprocessing Example

Using ProcessPoolExecutor

Sharing Data Between Processes

Performance Comparison

Decision Guide

Common Patterns

Thread-Safe Counter

Producer-Consumer with Queue

Summary