How to Get Unique Values from a List of Dictionaries in Python
When working with lists of dictionaries, a common data structure for representing records, API responses, or database rows, you'll often need to extract all unique values across the dictionaries. This is useful for identifying distinct entries, building filter options, or deduplicating data.
For example, given:
data = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}, {'name': 'Alice', 'age': 22}]
The unique values across all dictionaries are: {'Alice', 'Bob', 25, 30, 22}.
This guide covers several approaches, from extracting all unique values to targeting specific keys.
Using a Set with a Generator Expression (Recommended)ā
The most concise and efficient way to collect unique values is combining a set with a nested generator expression:
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Alice', 'age': 22}
]
unique_values = set(val for d in data for val in d.values())
print(unique_values)
Output:
{'Alice', 'Bob', 22, 25, 30}
The generator expression iterates over each dictionary in data, then over each value in that dictionary. The set automatically discards duplicates.
Sets are unordered, so the output order is not guaranteed. If you need to preserve the order of first appearance, see the dict.fromkeys() approach below.
Getting Unique Values for a Specific Keyā
More commonly, you need unique values from a specific key rather than all values across all keys. For example, getting all unique names:
data = [
{'name': 'Alice', 'age': 25, 'city': 'NYC'},
{'name': 'Bob', 'age': 30, 'city': 'LA'},
{'name': 'Alice', 'age': 22, 'city': 'Chicago'},
{'name': 'Charlie', 'age': 30, 'city': 'NYC'}
]
# Unique names
unique_names = set(d['name'] for d in data)
print("Unique names:", unique_names)
# Unique cities
unique_cities = set(d['city'] for d in data)
print("Unique cities:", unique_cities)
# Unique ages
unique_ages = set(d['age'] for d in data)
print("Unique ages:", unique_ages)
Output:
Unique names: {'Charlie', 'Alice', 'Bob'}
Unique cities: {'Chicago', 'NYC', 'LA'}
Unique ages: {25, 30, 22}
If some dictionaries might not contain the target key, use .get() to avoid KeyError:
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob'}, # No 'age' key
{'name': 'Charlie', 'age': 30}
]
unique_ages = set(d.get('age') for d in data if 'age' in d)
print("Unique ages:", unique_ages)
Output
{25, 30}
Using dict.fromkeys() to Preserve Orderā
If you need unique values in the order they first appear, dict.fromkeys() is the best approach (Python 3.7+):
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Alice', 'age': 22}
]
unique_values = list(dict.fromkeys(val for d in data for val in d.values()))
print(unique_values)
Output:
['Alice', 25, 'Bob', 30, 22]
Since dictionary keys are unique and maintain insertion order in Python 3.7+, this approach deduplicates while preserving the order of first occurrence.
Using itertools.chain.from_iterable()ā
For large datasets, itertools.chain.from_iterable() efficiently flattens the values from all dictionaries into a single iterator without creating intermediate lists:
from itertools import chain
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Alice', 'age': 22}
]
unique_values = set(chain.from_iterable(d.values() for d in data))
print(unique_values)
Output:
{'Alice', 'Bob', 22, 25, 30}
chain.from_iterable() lazily iterates through each dictionary's values without building an intermediate collection, making it memory-efficient for very large datasets.
Using a Loop for Custom Logicā
A manual loop gives you the most flexibility to add custom filtering or transformation during the deduplication:
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Alice', 'age': 22}
]
unique_values = []
seen = set()
for d in data:
for val in d.values():
if val not in seen:
seen.add(val)
unique_values.append(val)
print(unique_values)
Output:
['Alice', 25, 'Bob', 30, 22]
This approach preserves order and runs in O(n) time thanks to the seen set for fast lookups.
Without the seen set, checking if val not in unique_values against a list is O(n) per check, making the overall approach O(n²):
# Slower approach: O(n²)
unique_values = []
for d in data:
for val in d.values():
if val not in unique_values: # O(n) lookup on each iteration
unique_values.append(val)
Always use a set for membership checks alongside the list to maintain O(n) performance.
Practical Example: Extracting Unique Values per Keyā
A real-world scenario is building a summary of all unique values for each key across a list of records:
data = [
{'product': 'Laptop', 'brand': 'Dell', 'category': 'Electronics'},
{'product': 'Phone', 'brand': 'Apple', 'category': 'Electronics'},
{'product': 'Shirt', 'brand': 'Nike', 'category': 'Clothing'},
{'product': 'Laptop', 'brand': 'HP', 'category': 'Electronics'},
{'product': 'Shoes', 'brand': 'Nike', 'category': 'Clothing'}
]
# Get unique values for each key
summary = {}
for key in data[0].keys():
summary[key] = list(dict.fromkeys(d[key] for d in data))
for key, values in summary.items():
print(f"{key}: {values}")
Output:
product: ['Laptop', 'Phone', 'Shirt', 'Shoes']
brand: ['Dell', 'Apple', 'Nike', 'HP']
category: ['Electronics', 'Clothing']
Comparison of Approachesā
| Method | Preserves Order | Time Complexity | Memory Efficient | Best For |
|---|---|---|---|---|
set() + generator | ā | O(n) | ā | Fast deduplication (recommended) |
dict.fromkeys() | ā | O(n) | ā | Order-preserving deduplication |
itertools.chain + set | ā | O(n) | ā | Large datasets |
Loop with seen set | ā | O(n) | ā | Custom logic during dedup |
| Loop (list-only check) | ā | O(n²) | ā | Small datasets, simplicity |
Conclusionā
For extracting unique values from a list of dictionaries, a set with a generator expression is the fastest and most concise approach.
- Use
dict.fromkeys()when you need to preserve the order of first appearance. For specific keys, target them directly in the generator expression. - For large datasets,
itertools.chain.from_iterable()provides memory-efficient iteration. Choose the method that best matches your needs for ordering, performance, and flexibility.