How to Group a List of Tuples into a Dictionary by Key in Python
Converting a list of key-value tuples into a dictionary where each key maps to a list of associated values is a common data transformation task.
Using defaultdict (Recommended)
The defaultdict from the collections module provides the cleanest and most efficient solution:
from collections import defaultdict
data = [(1, 'A'), (1, 'B'), (2, 'C'), (2, 'D'), (3, 'E')]
grouped = defaultdict(list)
for key, value in data:
grouped[key].append(value)
print(dict(grouped))
# {1: ['A', 'B'], 2: ['C', 'D'], 3: ['E']}
The defaultdict(list) automatically creates an empty list for any new key, eliminating the need for existence checks.
Using setdefault (No Imports)
When avoiding imports, setdefault provides similar functionality:
data = [(1, 'A'), (1, 'B'), (2, 'C'), (2, 'D'), (3, 'E')]
grouped = {}
for key, value in data:
grouped.setdefault(key, []).append(value)
print(grouped)
# {1: ['A', 'B'], 2: ['C', 'D'], 3: ['E']}
The setdefault method returns the existing value for a key or sets and returns the default if the key does not exist.
Grouping with Sets (Unique Values)
To collect only unique values per key, use sets instead of lists:
from collections import defaultdict
data = [(1, 'A'), (1, 'A'), (1, 'B'), (2, 'C')]
grouped = defaultdict(set)
for key, value in data:
grouped[key].add(value)
print(dict(grouped))
# {1: {'B', 'A'}, 2: {'C'}}
Grouping Objects by Attribute
Apply the same pattern to group objects:
from collections import defaultdict
class Product:
def __init__(self, name, category):
self.name = name
self.category = category
products = [
Product("Apple", "Fruit"),
Product("Banana", "Fruit"),
Product("Carrot", "Vegetable"),
]
by_category = defaultdict(list)
for product in products:
by_category[product.category].append(product.name)
print(dict(by_category))
# {'Fruit': ['Apple', 'Banana'], 'Vegetable': ['Carrot']}
Using itertools.groupby
For pre-sorted data, groupby offers a functional approach:
from itertools import groupby
data = [(1, 'A'), (1, 'B'), (2, 'C'), (2, 'D')]
# Data MUST be sorted by key first
data_sorted = sorted(data, key=lambda x: x[0])
grouped = {
key: [v for _, v in group]
for key, group in groupby(data_sorted, key=lambda x: x[0])
}
print(grouped)
# {1: ['A', 'B'], 2: ['C', 'D']}
groupby only groups consecutive elements with the same key. Always sort your data first, or you'll get unexpected results:
from itertools import groupby
unsorted = [(1, 'A'), (2, 'B'), (1, 'C')] # Not sorted
# Wrong result: does not combine all 1s
result = {k: list(g) for k, g in groupby(unsorted, key=lambda x: x[0])}
print(result)
# {1: [(1, 'C')], 2: [(2, 'B')]} # Missing (1, 'A')!
Avoid the O(N²) Trap
A nested comprehension approach looks elegant but performs terribly:
data = [(1, 'A'), (1, 'B'), (2, 'C')]
# ❌ O(N²): scans entire list for each unique key
grouped = {
k: [v for k2, v in data if k2 == k]
for k, _ in data
}
print(grouped)
# {1: ['A', 'B'], 2: ['C']}
This iterates through the entire list for every tuple, making it quadratic complexity.
# ✅ O(N): single pass through the data
from collections import defaultdict
data = [(1, 'A'), (1, 'B'), (2, 'C')]
grouped = defaultdict(list)
for k, v in data:
grouped[k].append(v)
print(dict(grouped))
# {1: ['A', 'B'], 2: ['C']}
Grouping Multiple Values
When tuples have more than two elements:
from collections import defaultdict
# (category, product, price)
data = [
('Fruit', 'Apple', 1.50),
('Fruit', 'Banana', 0.75),
('Vegetable', 'Carrot', 0.50),
]
# Group by category, keeping product-price pairs
grouped = defaultdict(list)
for category, product, price in data:
grouped[category].append({'product': product, 'price': price})
print(dict(grouped))
# {'Fruit': [{'product': 'Apple', 'price': 1.5},
# {'product': 'Banana', 'price': 0.75}],
# 'Vegetable': [{'product': 'Carrot', 'price': 0.5}]}
Aggregating Instead of Collecting
Sometimes you want to aggregate values rather than collect them:
from collections import defaultdict
sales = [('Jan', 100), ('Jan', 150), ('Feb', 200), ('Feb', 50)]
# Sum values by key
totals = defaultdict(int)
for month, amount in sales:
totals[month] += amount
print(dict(totals))
# {'Jan': 250, 'Feb': 250}
Method Comparison
| Method | Time Complexity | Imports | Best For |
|---|---|---|---|
defaultdict(list) | O(N) | Yes | General use |
setdefault | O(N) | No | Import-free code |
groupby | O(N log N)* | Yes | Pre-sorted data |
| Nested comprehension | O(N²) | No | Never use |
*Includes sorting time
Summary
- Use
defaultdict(list)as your default approach for grouping; it is efficient, readable, and handles missing keys automatically. - Use
setdefaultwhen you want to avoid imports. - Avoid nested comprehensions for grouping as they create O(N²) performance problems.