How to Break a List into Chunks of Size N in Python

Chunking a list, i.e. splitting it into smaller pieces of a fixed size, is a common task when batch processing data, paginating results, or working with APIs that have rate limits.

This guide covers the standard approaches from basic list comprehension to Python 3.12's built-in solution.

Using List Comprehension

The most compatible approach works with all Python versions:

def chunk_list(data: list, chunk_size: int) -> list:
    """Split a list into chunks of specified size."""
    return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]


data = list(range(25))
chunks = chunk_list(data, 10)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk}")

Output:

Chunk 0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Chunk 1: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Chunk 2: [20, 21, 22, 23, 24]

Automatic Remainder Handling

The last chunk automatically contains the remaining elements, even if fewer than chunk_size.

Using `itertools.batched` (Python 3.12+)

Python 3.12 introduced itertools.batched, a built-in function specifically for this purpose:

from itertools import batched

data = [1, 2, 3, 4, 5, 6, 7]

for batch in batched(data, 3):
    print(batch)

Output:

(1, 2, 3)
(4, 5, 6)
(7,)

Converting to Lists

Note that batched returns tuples. Convert to lists if needed:

from itertools import batched

data = list(range(10))
chunks = [list(batch) for batch in batched(data, 3)]

print(chunks)

Output:

[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Using a Generator (Memory Efficient)

For very large datasets, avoid creating all chunks in memory at once. Use a generator to yield chunks one at a time:

def chunk_generator(data: list, chunk_size: int):
    """Yield successive chunks from data."""
    for i in range(0, len(data), chunk_size):
        yield data[i:i + chunk_size]

def process(chunk):
    # Example: just print the sum of each chunk
    print(sum(chunk))

# Process chunks one at a time
large_data = list(range(1_000_000))

for chunk in chunk_generator(large_data, 10_000):
    process(chunk)  # Only one chunk in memory at a time

Output (sum of numbers in each chunk):

Generator for Any Iterable

This version works with any iterable, not just lists:

from itertools import islice


def chunk_iterable(iterable, chunk_size: int):
    """Yield chunks from any iterable."""
    iterator = iter(iterable)
    while True:
        chunk = list(islice(iterator, chunk_size))
        if not chunk:
            break
        yield chunk


# Works with generators, files, etc.
for chunk in chunk_iterable(range(25), 10):
    print(chunk)

Output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]

Practical Examples

Batch Database Inserts

def batch_insert(records: list, batch_size: int = 100):
    """Insert records in batches to avoid memory issues."""
    for i in range(0, len(records), batch_size):
        batch = records[i:i + batch_size]
        # a fake db insert ...
        print("db.insert_many(batch)")
        print(f"Inserted batch of {len(batch)} records")


records = [{"id": i, "value": i * 10} for i in range(1000)]
batch_insert(records, batch_size=250)

API Rate Limiting

import time


def process_with_rate_limit(items: list, batch_size: int, delay: float):
    """Process items in batches with delay between batches."""
    chunks = [items[i:i + batch_size] for i in range(0, len(items), batch_size)]

    for i, chunk in enumerate(chunks):
        print(f"Processing batch {i + 1}/{len(chunks)}")
        
        for item in chunk:
            # a fake api call ...
            print("api.send(item)")
            

        if i < len(chunks) - 1:
            time.sleep(delay)


urls = [f"https://api.example.com/{i}" for i in range(100)]
process_with_rate_limit(urls, batch_size=10, delay=1.0)

Parallel Processing

from concurrent.futures import ThreadPoolExecutor


def process_chunk(chunk: list) -> list:
    """Process a single chunk."""
    return [item * 2 for item in chunk]


data = list(range(1000))
chunks = [data[i:i + 100] for i in range(0, len(data), 100)]

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_chunk, chunks))

# Flatten results
flat_results = [item for chunk in results for item in chunk]

Using NumPy for Numerical Data

For numerical arrays, NumPy provides efficient chunking:

import numpy as np


def chunk_array(arr: np.ndarray, chunk_size: int) -> list:
    """Split NumPy array into chunks."""
    return np.array_split(arr, max(1, len(arr) // chunk_size))


data = np.arange(25)
chunks = chunk_array(data, 10)

for chunk in chunks:
    print(chunk)

Output:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12]
[13 14 15 16 17 18 19 20 21 22 23 24]

NumPy Behavior

np.array_split divides into roughly equal parts, which may not match exact chunk sizes. Use list slicing for precise control.

Method Comparison

Method	Python Version	Returns	Memory	Best For
`itertools.batched`	3.12+	Tuples	Lazy	New projects
List comprehension	All	Lists	Eager	Compatibility
Generator	All	Iterator	Lazy	Large datasets
NumPy	All (with NumPy)	Arrays	Varies	Numerical data

Edge Cases

def chunk_list(data: list, chunk_size: int) -> list:
    """Split list into chunks with validation."""
    if chunk_size <= 0:
        raise ValueError("Chunk size must be positive")

    if not data:
        return []

    return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]


# Empty list
print(chunk_list([], 5))          # []

# Chunk size larger than list
print(chunk_list([1, 2, 3], 10))  # [[1, 2, 3]]

# Chunk size of 1
print(chunk_list([1, 2, 3], 1))   # [[1], [2], [3]]

Output:

[]
[[1, 2, 3]]
[[1], [2], [3]]

Summary

Use itertools.batched on Python 3.12+ for the cleanest solution.
Use list comprehension for maximum compatibility with older Python versions.
Use generators when processing large datasets to minimize memory usage.
Always validate chunk size and handle empty inputs gracefully.

Using List Comprehension​

Using itertools.batched (Python 3.12+)​

Converting to Lists​

Using a Generator (Memory Efficient)​

Generator for Any Iterable​

Practical Examples​

Batch Database Inserts​

API Rate Limiting​

Parallel Processing​

Using NumPy for Numerical Data​

Method Comparison​

Edge Cases​

Summary​

Table of Contents

Using List Comprehension

Using `itertools.batched` (Python 3.12+)

Converting to Lists

Using a Generator (Memory Efficient)

Generator for Any Iterable

Practical Examples

Batch Database Inserts

API Rate Limiting

Parallel Processing

Using NumPy for Numerical Data

Method Comparison

Edge Cases

Summary