How to Count Distinct Elements in a List in Python
Counting the number of unique items in a list is one of the most common operations in data processing and analysis. Whether you are deduplicating records, summarizing survey responses, or profiling a dataset, you need a reliable way to determine how many distinct values a collection contains. For example, the list [1, 1, 2, 3, 2] contains 3 distinct elements.
In this guide, you will learn multiple ways to count distinct elements in a Python list, from the simplest built-in approach to methods suited for complex data structures and large datasets.
Using len(set()) for Simple Counting
The most direct and Pythonic way to count distinct elements is to convert the list into a set, which automatically removes duplicates, and then measure its length:
data = ["apple", "banana", "apple", "cherry"]
unique_count = len(set(data))
print(unique_count)
Output:
3
This approach operates in O(n) time complexity and is both fast and readable. It works with any list whose elements are hashable, including strings, integers, floats, and tuples.
Using collections.Counter for Count and Frequency
When you need both the number of unique elements and the frequency of each one, the Counter class from the collections module provides all of that information in a single pass:
from collections import Counter
data = [10, 20, 10, 10, 30]
freq = Counter(data)
print(len(freq))
print(list(freq.keys()))
print(freq[10])
print(freq.most_common(2))
Output:
3
[10, 20, 30]
3
[(10, 3), (20, 1)]
len(freq)gives the distinct element count.freq.keys()returns the unique elements themselves.freq[10]returns how many times10appears.freq.most_common(2)returns the two most frequent elements sorted by count.
This is especially useful when you need to go beyond simple counting and perform frequency analysis.
Using a Manual Loop
If you want full control over the counting process or need to apply custom logic during iteration, a manual loop with a set works well:
data = ["apple", "banana", "apple", "cherry", "banana"]
seen = set()
for item in data:
seen.add(item)
print(len(seen))
Output:
3
This approach is functionally equivalent to len(set(data)), but it allows you to insert additional logic inside the loop, such as logging, filtering, or conditional counting:
data = ["apple", "banana", "APPLE", "Cherry", "banana"]
seen = set()
for item in data:
normalized = item.casefold()
seen.add(normalized)
print(len(seen))
Output:
3
In this example, "apple" and "APPLE" are treated as the same element thanks to case normalization before insertion.
Counting Distinct Values by a Specific Key
When working with a list of dictionaries or complex objects, you often need to count distinct values based on a particular field or attribute rather than the entire object:
users = [
{"name": "Alice", "dept": "Sales"},
{"name": "Bob", "dept": "Engineering"},
{"name": "Carol", "dept": "Sales"}
]
distinct_depts = len(set(user["dept"] for user in users))
print(distinct_depts)
Output:
2
The generator expression extracts the "dept" value from each dictionary, and the set removes duplicates before len() counts the remaining unique departments.
Using Pandas for DataFrames
For tabular data stored in a Pandas DataFrame, the nunique() method provides an optimized way to count distinct values:
import pandas as pd
df = pd.DataFrame({"category": [1, 1, 2, 3, 2, 1]})
print(df["category"].nunique())
Output:
3
You can also count distinct values across multiple columns at once:
import pandas as pd
df = pd.DataFrame({
"color": ["red", "blue", "red"],
"size": ["S", "M", "S"]
})
print(df.nunique())
Output:
color 2
size 2
dtype: int64
Pandas nunique() is highly optimized for large datasets and integrates naturally with other DataFrame operations like grouping, filtering, and aggregation. If your data is already in a DataFrame, prefer nunique() over converting columns to plain Python lists.
Handling Unhashable Elements
The set() approach requires all elements to be hashable. Common unhashable types include lists, dictionaries, and sets themselves. Attempting to create a set from a list of lists raises an error:
data = [[1, 2], [3, 4], [1, 2]]
unique_count = len(set(data))
Output:
TypeError: unhashable type: 'list'
To fix this, convert the inner lists to tuples (which are hashable) before building the set:
data = [[1, 2], [3, 4], [1, 2]]
unique_count = len(set(tuple(item) for item in data))
print(unique_count)
Output:
2
If your list contains dictionaries or other complex unhashable types, you need to convert them into a hashable representation first. Common strategies include:
- Lists to tuples:
tuple(item) - Sets to frozensets:
frozenset(item) - Dictionaries to tuples of sorted items:
tuple(sorted(item.items()))
Choose a conversion strategy that preserves the identity of each element for accurate deduplication.
Preserving Order While Counting
The set() approach discards both duplicates and insertion order. If you need to preserve the order of first occurrence while still counting distinct elements, use dict.fromkeys():
data = ["banana", "apple", "cherry", "apple", "banana"]
seen = dict.fromkeys(data)
print(len(seen))
print(list(seen.keys()))
Output:
3
['banana', 'apple', 'cherry']
Since Python 3.7+, dictionaries maintain insertion order, so dict.fromkeys() keeps the first occurrence of each element in the original sequence order.
Method Comparison
| Goal | Method | Notes |
|---|---|---|
| Count only | len(set(data)) | Fastest and most Pythonic for hashable items |
| Count with frequencies | Counter(data) | Slightly more overhead, but provides full analysis |
| Custom logic during counting | Manual loop with set | Flexible for filtering or normalization |
| Count by specific key | set() with generator expression | Works well for lists of dicts or objects |
| Large tabular datasets | pd.Series.nunique() | Optimized for DataFrames |
| Unhashable items | Convert to hashable first | Use tuple(), frozenset(), etc. |
| Preserve order | dict.fromkeys(data) | Maintains first-occurrence order in Python 3.7+ |
Conclusion
For most everyday tasks, len(set(data)) is the best choice. It is concise, fast, and immediately communicable to any Python developer. When you also need frequency information, collections.Counter provides a complete frequency map in a single pass. For large tabular datasets already stored in a DataFrame, Pandas nunique() is the natural and optimized option. And when dealing with unhashable elements, always convert them into a hashable form before using set-based approaches.
By choosing the right method for your data and requirements, you can count distinct elements efficiently and write cleaner, more maintainable Python code.