How to Break a List into Chunks of Size N in Python
Chunking a list, i.e. splitting it into smaller pieces of a fixed size, is a common task when batch processing data, paginating results, or working with APIs that have rate limits.
This guide covers the standard approaches from basic list comprehension to Python 3.12's built-in solution.
Using List Comprehension
The most compatible approach works with all Python versions:
def chunk_list(data: list, chunk_size: int) -> list:
"""Split a list into chunks of specified size."""
return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
data = list(range(25))
chunks = chunk_list(data, 10)
for i, chunk in enumerate(chunks):
print(f"Chunk {i}: {chunk}")
Output:
Chunk 0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Chunk 1: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Chunk 2: [20, 21, 22, 23, 24]
The last chunk automatically contains the remaining elements, even if fewer than chunk_size.
Using itertools.batched (Python 3.12+)
Python 3.12 introduced itertools.batched, a built-in function specifically for this purpose:
from itertools import batched
data = [1, 2, 3, 4, 5, 6, 7]
for batch in batched(data, 3):
print(batch)
Output:
(1, 2, 3)
(4, 5, 6)
(7,)
Converting to Lists
Note that batched returns tuples. Convert to lists if needed:
from itertools import batched
data = list(range(10))
chunks = [list(batch) for batch in batched(data, 3)]
print(chunks)
Output:
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Using a Generator (Memory Efficient)
For very large datasets, avoid creating all chunks in memory at once. Use a generator to yield chunks one at a time:
def chunk_generator(data: list, chunk_size: int):
"""Yield successive chunks from data."""
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]
def process(chunk):
# Example: just print the sum of each chunk
print(sum(chunk))
# Process chunks one at a time
large_data = list(range(1_000_000))
for chunk in chunk_generator(large_data, 10_000):
process(chunk) # Only one chunk in memory at a time
Output (sum of numbers in each chunk):
49995000
149995000
249995000
349995000
449995000
549995000
649995000
...
Generator for Any Iterable
This version works with any iterable, not just lists:
from itertools import islice
def chunk_iterable(iterable, chunk_size: int):
"""Yield chunks from any iterable."""
iterator = iter(iterable)
while True:
chunk = list(islice(iterator, chunk_size))
if not chunk:
break
yield chunk
# Works with generators, files, etc.
for chunk in chunk_iterable(range(25), 10):
print(chunk)
Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]
Practical Examples
Batch Database Inserts
def batch_insert(records: list, batch_size: int = 100):
"""Insert records in batches to avoid memory issues."""
for i in range(0, len(records), batch_size):
batch = records[i:i + batch_size]
# a fake db insert ...
print("db.insert_many(batch)")
print(f"Inserted batch of {len(batch)} records")
records = [{"id": i, "value": i * 10} for i in range(1000)]
batch_insert(records, batch_size=250)
API Rate Limiting
import time
def process_with_rate_limit(items: list, batch_size: int, delay: float):
"""Process items in batches with delay between batches."""
chunks = [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
for i, chunk in enumerate(chunks):
print(f"Processing batch {i + 1}/{len(chunks)}")
for item in chunk:
# a fake api call ...
print("api.send(item)")
if i < len(chunks) - 1:
time.sleep(delay)
urls = [f"https://api.example.com/{i}" for i in range(100)]
process_with_rate_limit(urls, batch_size=10, delay=1.0)
Parallel Processing
from concurrent.futures import ThreadPoolExecutor
def process_chunk(chunk: list) -> list:
"""Process a single chunk."""
return [item * 2 for item in chunk]
data = list(range(1000))
chunks = [data[i:i + 100] for i in range(0, len(data), 100)]
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(process_chunk, chunks))
# Flatten results
flat_results = [item for chunk in results for item in chunk]
Using NumPy for Numerical Data
For numerical arrays, NumPy provides efficient chunking:
import numpy as np
def chunk_array(arr: np.ndarray, chunk_size: int) -> list:
"""Split NumPy array into chunks."""
return np.array_split(arr, max(1, len(arr) // chunk_size))
data = np.arange(25)
chunks = chunk_array(data, 10)
for chunk in chunks:
print(chunk)
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12]
[13 14 15 16 17 18 19 20 21 22 23 24]
np.array_split divides into roughly equal parts, which may not match exact chunk sizes. Use list slicing for precise control.
Method Comparison
| Method | Python Version | Returns | Memory | Best For |
|---|---|---|---|---|
itertools.batched | 3.12+ | Tuples | Lazy | New projects |
| List comprehension | All | Lists | Eager | Compatibility |
| Generator | All | Iterator | Lazy | Large datasets |
| NumPy | All (with NumPy) | Arrays | Varies | Numerical data |
Edge Cases
def chunk_list(data: list, chunk_size: int) -> list:
"""Split list into chunks with validation."""
if chunk_size <= 0:
raise ValueError("Chunk size must be positive")
if not data:
return []
return [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
# Empty list
print(chunk_list([], 5)) # []
# Chunk size larger than list
print(chunk_list([1, 2, 3], 10)) # [[1, 2, 3]]
# Chunk size of 1
print(chunk_list([1, 2, 3], 1)) # [[1], [2], [3]]
Output:
[]
[[1, 2, 3]]
[[1], [2], [3]]
Summary
- Use
itertools.batchedon Python 3.12+ for the cleanest solution. - Use list comprehension for maximum compatibility with older Python versions.
- Use generators when processing large datasets to minimize memory usage.
- Always validate chunk size and handle empty inputs gracefully.