Skip to main content

How to Aggregate Lists of Dictionaries in Python

Aggregating data, summarizing, grouping, or transforming, is a daily task for Python developers. When working with lists of dictionaries (a common format for JSON data or database records), knowing the right tools for aggregation is crucial for performance and readability.

This guide demonstrates efficient methods to sum values, group items, and calculate statistics from lists of dictionaries.

Sample Data

We will use a list of sales records for our examples.

sales_data = [
{"product": "Laptop", "category": "Electronics", "price": 1000, "quantity": 2},
{"product": "Phone", "category": "Electronics", "price": 500, "quantity": 5},
{"product": "Book", "category": "Books", "price": 20, "quantity": 10},
{"product": "Tablet", "category": "Electronics", "price": 300, "quantity": 3},
{"product": "Pen", "category": "Stationery", "price": 5, "quantity": 50}
]

Method 1: Using sum() and Generator Expressions

For simple calculations like total revenue or total items sold, Python's built-in sum() combined with a generator expression is the most pythonic and memory-efficient way.

sales_data = [
{"product": "Laptop", "category": "Electronics", "price": 1000, "quantity": 2},
{"product": "Phone", "category": "Electronics", "price": 500, "quantity": 5},
{"product": "Book", "category": "Books", "price": 20, "quantity": 10},
{"product": "Tablet", "category": "Electronics", "price": 300, "quantity": 3},
{"product": "Pen", "category": "Stationery", "price": 5, "quantity": 50}
]

# Calculate Total Revenue: sum(price * quantity)
total_revenue = sum(item['price'] * item['quantity'] for item in sales_data)

# Calculate Total Quantity Sold
total_quantity = sum(item['quantity'] for item in sales_data)

print(f"Total Revenue: ${total_revenue}")
print(f"Total Items Sold: {total_quantity}")

Output:

Total Revenue: $5850
Total Items Sold: 70

Method 2: Grouping with collections.defaultdict

If you need to group data (e.g., total sales per category), defaultdict is often cleaner than using standard dictionary checks.

from collections import defaultdict

sales_data = [
{"product": "Laptop", "category": "Electronics", "price": 1000, "quantity": 2},
{"product": "Phone", "category": "Electronics", "price": 500, "quantity": 5},
{"product": "Book", "category": "Books", "price": 20, "quantity": 10},
{"product": "Tablet", "category": "Electronics", "price": 300, "quantity": 3},
{"product": "Pen", "category": "Stationery", "price": 5, "quantity": 50}
]

# Accumulator dictionary
category_sales = defaultdict(int)

for item in sales_data:
category = item['category']
revenue = item['price'] * item['quantity']

# Automatically initializes category to 0 if it doesn't exist
category_sales[category] += revenue

print("Revenue by Category:")
for category, amount in category_sales.items():
print(f"{category}: ${amount}")

Output:

Revenue by Category:
Electronics: $5400
Books: $200
Stationery: $250

Method 3: Grouping with itertools.groupby

itertools.groupby is powerful but requires the input list to be sorted by the grouping key first. It is useful when you need to iterate through groups rather than just sum them.

from itertools import groupby

sales_data = [
{"product": "Laptop", "category": "Electronics", "price": 1000, "quantity": 2},
{"product": "Phone", "category": "Electronics", "price": 500, "quantity": 5},
{"product": "Book", "category": "Books", "price": 20, "quantity": 10},
{"product": "Tablet", "category": "Electronics", "price": 300, "quantity": 3},
{"product": "Pen", "category": "Stationery", "price": 5, "quantity": 50}
]

# 1. Sort data by category first (Crucial step!)
sorted_sales = sorted(sales_data, key=lambda x: x['category'])

print("Items in each category:")
# 2. Group by category
for category, group in groupby(sorted_sales, key=lambda x: x['category']):
# 'group' is an iterator, so we convert to list to use it multiple times or print
items = list(group)
item_names = [item['product'] for item in items]
print(f"{category}: {item_names}")

Output:

Items in each category:
Books: ['Book']
Electronics: ['Laptop', 'Phone', 'Tablet']
Stationery: ['Pen']

Method 4: Using Pandas (For Large Datasets)

For complex aggregations or large datasets, pandas is the industry standard. It handles grouping and aggregation with SQL-like syntax efficiently.

import pandas as pd

sales_data = [
{"product": "Laptop", "category": "Electronics", "price": 1000, "quantity": 2},
{"product": "Phone", "category": "Electronics", "price": 500, "quantity": 5},
{"product": "Book", "category": "Books", "price": 20, "quantity": 10},
{"product": "Tablet", "category": "Electronics", "price": 300, "quantity": 3},
{"product": "Pen", "category": "Stationery", "price": 5, "quantity": 50}
]

# Create DataFrame
df = pd.DataFrame(sales_data)

# Calculate revenue per row
df['revenue'] = df['price'] * df['quantity']

# Aggregate: Group by category and sum revenue
category_summary = df.groupby('category')['revenue'].sum()

print(category_summary)

Output:

category
Books 200
Electronics 5400
Stationery 250
Name: revenue, dtype: int64

Conclusion

Choosing the right aggregation method depends on your needs:

  1. Simple Sums: Use sum() with generator expressions.
  2. Grouping & Summing: Use collections.defaultdict.
  3. Grouping & Iterating: Use itertools.groupby (remember to sort first!).
  4. Complex Analysis: Use pandas for powerful DataFrames.