Skip to main content

How to Count Distinct Elements in a List in Python

Counting the number of unique items in a list is one of the most common operations in data processing and analysis. Whether you are deduplicating records, summarizing survey responses, or profiling a dataset, you need a reliable way to determine how many distinct values a collection contains. For example, the list [1, 1, 2, 3, 2] contains 3 distinct elements.

In this guide, you will learn multiple ways to count distinct elements in a Python list, from the simplest built-in approach to methods suited for complex data structures and large datasets.

Using len(set()) for Simple Counting

The most direct and Pythonic way to count distinct elements is to convert the list into a set, which automatically removes duplicates, and then measure its length:

data = ["apple", "banana", "apple", "cherry"]

unique_count = len(set(data))

print(unique_count)

Output:

3

This approach operates in O(n) time complexity and is both fast and readable. It works with any list whose elements are hashable, including strings, integers, floats, and tuples.

Using collections.Counter for Count and Frequency

When you need both the number of unique elements and the frequency of each one, the Counter class from the collections module provides all of that information in a single pass:

from collections import Counter

data = [10, 20, 10, 10, 30]

freq = Counter(data)

print(len(freq))
print(list(freq.keys()))
print(freq[10])
print(freq.most_common(2))

Output:

3
[10, 20, 30]
3
[(10, 3), (20, 1)]
  • len(freq) gives the distinct element count.
  • freq.keys() returns the unique elements themselves.
  • freq[10] returns how many times 10 appears.
  • freq.most_common(2) returns the two most frequent elements sorted by count.

This is especially useful when you need to go beyond simple counting and perform frequency analysis.

Using a Manual Loop

If you want full control over the counting process or need to apply custom logic during iteration, a manual loop with a set works well:

data = ["apple", "banana", "apple", "cherry", "banana"]

seen = set()
for item in data:
seen.add(item)

print(len(seen))

Output:

3

This approach is functionally equivalent to len(set(data)), but it allows you to insert additional logic inside the loop, such as logging, filtering, or conditional counting:

data = ["apple", "banana", "APPLE", "Cherry", "banana"]

seen = set()
for item in data:
normalized = item.casefold()
seen.add(normalized)

print(len(seen))

Output:

3

In this example, "apple" and "APPLE" are treated as the same element thanks to case normalization before insertion.

Counting Distinct Values by a Specific Key

When working with a list of dictionaries or complex objects, you often need to count distinct values based on a particular field or attribute rather than the entire object:

users = [
{"name": "Alice", "dept": "Sales"},
{"name": "Bob", "dept": "Engineering"},
{"name": "Carol", "dept": "Sales"}
]

distinct_depts = len(set(user["dept"] for user in users))

print(distinct_depts)

Output:

2

The generator expression extracts the "dept" value from each dictionary, and the set removes duplicates before len() counts the remaining unique departments.

Using Pandas for DataFrames

For tabular data stored in a Pandas DataFrame, the nunique() method provides an optimized way to count distinct values:

import pandas as pd

df = pd.DataFrame({"category": [1, 1, 2, 3, 2, 1]})

print(df["category"].nunique())

Output:

3

You can also count distinct values across multiple columns at once:

import pandas as pd

df = pd.DataFrame({
"color": ["red", "blue", "red"],
"size": ["S", "M", "S"]
})

print(df.nunique())

Output:

color    2
size 2
dtype: int64
tip

Pandas nunique() is highly optimized for large datasets and integrates naturally with other DataFrame operations like grouping, filtering, and aggregation. If your data is already in a DataFrame, prefer nunique() over converting columns to plain Python lists.

Handling Unhashable Elements

The set() approach requires all elements to be hashable. Common unhashable types include lists, dictionaries, and sets themselves. Attempting to create a set from a list of lists raises an error:

data = [[1, 2], [3, 4], [1, 2]]

unique_count = len(set(data))

Output:

TypeError: unhashable type: 'list'

To fix this, convert the inner lists to tuples (which are hashable) before building the set:

data = [[1, 2], [3, 4], [1, 2]]

unique_count = len(set(tuple(item) for item in data))

print(unique_count)

Output:

2
warning

If your list contains dictionaries or other complex unhashable types, you need to convert them into a hashable representation first. Common strategies include:

  • Lists to tuples: tuple(item)
  • Sets to frozensets: frozenset(item)
  • Dictionaries to tuples of sorted items: tuple(sorted(item.items()))

Choose a conversion strategy that preserves the identity of each element for accurate deduplication.

Preserving Order While Counting

The set() approach discards both duplicates and insertion order. If you need to preserve the order of first occurrence while still counting distinct elements, use dict.fromkeys():

data = ["banana", "apple", "cherry", "apple", "banana"]

seen = dict.fromkeys(data)

print(len(seen))
print(list(seen.keys()))

Output:

3
['banana', 'apple', 'cherry']

Since Python 3.7+, dictionaries maintain insertion order, so dict.fromkeys() keeps the first occurrence of each element in the original sequence order.

Method Comparison

GoalMethodNotes
Count onlylen(set(data))Fastest and most Pythonic for hashable items
Count with frequenciesCounter(data)Slightly more overhead, but provides full analysis
Custom logic during countingManual loop with setFlexible for filtering or normalization
Count by specific keyset() with generator expressionWorks well for lists of dicts or objects
Large tabular datasetspd.Series.nunique()Optimized for DataFrames
Unhashable itemsConvert to hashable firstUse tuple(), frozenset(), etc.
Preserve orderdict.fromkeys(data)Maintains first-occurrence order in Python 3.7+

Conclusion

For most everyday tasks, len(set(data)) is the best choice. It is concise, fast, and immediately communicable to any Python developer. When you also need frequency information, collections.Counter provides a complete frequency map in a single pass. For large tabular datasets already stored in a DataFrame, Pandas nunique() is the natural and optimized option. And when dealing with unhashable elements, always convert them into a hashable form before using set-based approaches.

By choosing the right method for your data and requirements, you can count distinct elements efficiently and write cleaner, more maintainable Python code.