Skip to main content

How to Break a List into Chunks of Size N in Python

Chunking a list, i.e. splitting it into smaller pieces of a fixed size, is a common task when batch processing data, paginating results, or working with APIs that have rate limits.

This guide covers the standard approaches from basic list comprehension to Python 3.12's built-in solution.

Using List Comprehension

The most compatible approach works with all Python versions:

def chunk_list(data: list, chunk_size: int) -> list:
"""Split a list into chunks of specified size."""
return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]


data = list(range(25))
chunks = chunk_list(data, 10)

for i, chunk in enumerate(chunks):
print(f"Chunk {i}: {chunk}")

Output:

Chunk 0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Chunk 1: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Chunk 2: [20, 21, 22, 23, 24]
Automatic Remainder Handling

The last chunk automatically contains the remaining elements, even if fewer than chunk_size.

Using itertools.batched (Python 3.12+)

Python 3.12 introduced itertools.batched, a built-in function specifically for this purpose:

from itertools import batched

data = [1, 2, 3, 4, 5, 6, 7]

for batch in batched(data, 3):
print(batch)

Output:

(1, 2, 3)
(4, 5, 6)
(7,)

Converting to Lists

Note that batched returns tuples. Convert to lists if needed:

from itertools import batched

data = list(range(10))
chunks = [list(batch) for batch in batched(data, 3)]

print(chunks)

Output:

[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Using a Generator (Memory Efficient)

For very large datasets, avoid creating all chunks in memory at once. Use a generator to yield chunks one at a time:

def chunk_generator(data: list, chunk_size: int):
"""Yield successive chunks from data."""
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]

def process(chunk):
# Example: just print the sum of each chunk
print(sum(chunk))

# Process chunks one at a time
large_data = list(range(1_000_000))

for chunk in chunk_generator(large_data, 10_000):
process(chunk) # Only one chunk in memory at a time

Output (sum of numbers in each chunk):

49995000
149995000
249995000
349995000
449995000
549995000
649995000
...

Generator for Any Iterable

This version works with any iterable, not just lists:

from itertools import islice


def chunk_iterable(iterable, chunk_size: int):
"""Yield chunks from any iterable."""
iterator = iter(iterable)
while True:
chunk = list(islice(iterator, chunk_size))
if not chunk:
break
yield chunk


# Works with generators, files, etc.
for chunk in chunk_iterable(range(25), 10):
print(chunk)

Output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]

Practical Examples

Batch Database Inserts

def batch_insert(records: list, batch_size: int = 100):
"""Insert records in batches to avoid memory issues."""
for i in range(0, len(records), batch_size):
batch = records[i:i + batch_size]
# a fake db insert ...
print("db.insert_many(batch)")
print(f"Inserted batch of {len(batch)} records")


records = [{"id": i, "value": i * 10} for i in range(1000)]
batch_insert(records, batch_size=250)

API Rate Limiting

import time


def process_with_rate_limit(items: list, batch_size: int, delay: float):
"""Process items in batches with delay between batches."""
chunks = [items[i:i + batch_size] for i in range(0, len(items), batch_size)]

for i, chunk in enumerate(chunks):
print(f"Processing batch {i + 1}/{len(chunks)}")

for item in chunk:
# a fake api call ...
print("api.send(item)")


if i < len(chunks) - 1:
time.sleep(delay)


urls = [f"https://api.example.com/{i}" for i in range(100)]
process_with_rate_limit(urls, batch_size=10, delay=1.0)

Parallel Processing

from concurrent.futures import ThreadPoolExecutor


def process_chunk(chunk: list) -> list:
"""Process a single chunk."""
return [item * 2 for item in chunk]


data = list(range(1000))
chunks = [data[i:i + 100] for i in range(0, len(data), 100)]

with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(process_chunk, chunks))

# Flatten results
flat_results = [item for chunk in results for item in chunk]

Using NumPy for Numerical Data

For numerical arrays, NumPy provides efficient chunking:

import numpy as np


def chunk_array(arr: np.ndarray, chunk_size: int) -> list:
"""Split NumPy array into chunks."""
return np.array_split(arr, max(1, len(arr) // chunk_size))


data = np.arange(25)
chunks = chunk_array(data, 10)

for chunk in chunks:
print(chunk)

Output:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12]
[13 14 15 16 17 18 19 20 21 22 23 24]
NumPy Behavior

np.array_split divides into roughly equal parts, which may not match exact chunk sizes. Use list slicing for precise control.

Method Comparison

MethodPython VersionReturnsMemoryBest For
itertools.batched3.12+TuplesLazyNew projects
List comprehensionAllListsEagerCompatibility
GeneratorAllIteratorLazyLarge datasets
NumPyAll (with NumPy)ArraysVariesNumerical data

Edge Cases

def chunk_list(data: list, chunk_size: int) -> list:
"""Split list into chunks with validation."""
if chunk_size <= 0:
raise ValueError("Chunk size must be positive")

if not data:
return []

return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]


# Empty list
print(chunk_list([], 5)) # []

# Chunk size larger than list
print(chunk_list([1, 2, 3], 10)) # [[1, 2, 3]]

# Chunk size of 1
print(chunk_list([1, 2, 3], 1)) # [[1], [2], [3]]

Output:

[]
[[1, 2, 3]]
[[1], [2], [3]]

Summary

  • Use itertools.batched on Python 3.12+ for the cleanest solution.
  • Use list comprehension for maximum compatibility with older Python versions.
  • Use generators when processing large datasets to minimize memory usage.
  • Always validate chunk size and handle empty inputs gracefully.