How to Count Unique Characters in a String in Python
Counting the number of distinct characters in a string is a common task in text processing, input validation, and data analysis. At its core, this is a set operation: sets automatically eliminate duplicates, so converting a string to a set and measuring its length gives you the unique character count instantly.
In this guide, you will learn multiple ways to count unique characters in a Python string, including simple set-based counting, frequency analysis with Counter, various normalization strategies, and techniques for filtering specific character types.
Using set() for Simple Counting
The most direct and Pythonic approach converts the string to a set, which removes all duplicate characters, and then measures the length of the resulting set:
text = "hello"
unique_count = len(set(text))
print(unique_count)
print(set(text))
Output:
4
{'h', 'e', 'l', 'o'}
The string "hello" has five characters, but only four are distinct. The duplicate "l" is automatically eliminated by the set. This approach runs in O(n) time and is the fastest option when you only need the count.
Using Counter for Frequency Information
When you need both the number of unique characters and how many times each one appears, the Counter class from the collections module provides a complete frequency map:
from collections import Counter
text = "mississippi"
freq = Counter(text)
print(len(freq))
print(freq)
print(list(freq.keys()))
Output:
4
Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
['m', 'i', 's', 'p']
len(freq) gives the unique character count, while freq itself holds the full distribution. This is especially useful when your analysis goes beyond simple counting.
Normalizing Before Counting
Different use cases require different normalization strategies. Spaces, punctuation, and letter casing can all affect what counts as a "unique" character:
text = "Hello World!"
# Count all characters including spaces and punctuation
print(len(set(text)))
# Ignore case
print(len(set(text.lower())))
# Ignore spaces
print(len(set(text.replace(" ", ""))))
# Letters only, case-insensitive
letters_only = [c.lower() for c in text if c.isalpha()]
print(len(set(letters_only)))
Output:
9
9
8
7
Each line applies a different normalization:
- No normalization: all 9 distinct characters are counted, including the space and exclamation mark.
- Lowercase only: case is removed, but spaces and punctuation remain.
- No spaces: the space is excluded, reducing the count by one.
- Letters only, case-insensitive: only alphabetic characters are kept, and
"H"and"h"are treated as the same character.
Use .casefold() instead of .lower() for more aggressive case normalization that handles special Unicode characters correctly. For example, the German "ß" is converted to "ss" by .casefold(), whereas .lower() leaves it unchanged.
Counting Specific Character Types
You can filter the string to count only certain categories of characters using Python's built-in string methods:
text = "Hello123 World456!"
# Unique letters only
unique_letters = len(set(c for c in text if c.isalpha()))
print(f"Unique letters: {unique_letters}")
# Unique digits only
unique_digits = len(set(c for c in text if c.isdigit()))
print(f"Unique digits: {unique_digits}")
# Unique alphanumeric characters
unique_alnum = len(set(c for c in text if c.isalnum()))
print(f"Unique alphanumeric: {unique_alnum}")
Output:
Unique letters: 7
Unique digits: 6
Unique alphanumeric: 13
The generator expression inside set() filters characters before deduplication, so only characters matching the condition are included in the final count.
Finding Characters That Appear Exactly Once
Sometimes "unique" means characters that occur only a single time in the string, not just the set of distinct characters. Use Counter to identify these:
from collections import Counter
text = "abracadabra"
freq = Counter(text)
truly_unique = [char for char, count in freq.items() if count == 1]
print(truly_unique)
print(len(truly_unique))
Output:
['c', 'd']
2
Out of the four distinct characters in "abracadabra" (a, b, r, c, d), only "c" and "d" appear exactly once.
A Common Confusion: Distinct vs. Non-Repeated
It is important to recognize the difference between these two questions:
from collections import Counter
text = "abracadabra"
# Distinct characters (appear at least once)
distinct = len(set(text))
print(f"Distinct characters: {distinct}")
# Non-repeated characters (appear exactly once)
freq = Counter(text)
non_repeated = sum(1 for count in freq.values() if count == 1)
print(f"Non-repeated characters: {non_repeated}")
Output:
Distinct characters: 5
Non-repeated characters: 2
len(set(text)) counts all characters that exist in the string, while the Counter approach identifies only those that never repeat.
Working with Unicode Characters
Python's set() handles Unicode characters correctly, treating each code point as a separate character:
text = "café naïve 日本語"
unique_chars = set(text)
print(len(unique_chars))
print(sorted(unique_chars))
Output:
12
[' ', 'a', 'c', 'e', 'f', 'n', 'v', 'é', 'ï', '日', '本', '語']
Each Unicode code point is treated as a separate character. Accented characters like "é" are counted as distinct from "e". If you want to treat them as the same character, you would need to apply Unicode normalization using the unicodedata module before counting.
Comparing Strings by Their Unique Characters
You can check whether two strings are composed of the same set of characters, regardless of order or repetition:
def same_unique_chars(s1, s2):
return set(s1.lower()) == set(s2.lower())
print(same_unique_chars("abc", "cba"))
print(same_unique_chars("aabbcc", "abc"))
print(same_unique_chars("abc", "abcd"))
Output:
True
True
False
This is useful for tasks like checking whether two strings are potential anagrams (though a full anagram check also requires matching frequencies, not just the character set).
Method Comparison
| Method | Returns | Best For |
|---|---|---|
len(set(s)) | Integer count | Simple and fast unique counting |
len(Counter(s)) | Integer count | When frequencies are also needed |
set(s) | Set of characters | When you need the actual unique characters |
Counter(s) | Frequency dictionary | Full distribution analysis |
Filtered set() with generator | Integer count | Counting specific character types |
Both set() and Counter run in O(n) time complexity, but len(set(text)) is slightly faster and uses less memory when you only need the count without frequency information.
Conclusion
For straightforward unique character counting, len(set(text)) is the fastest and most Pythonic approach.
- Use
Counterwhen you also need frequency information or want to identify characters that appear exactly once. - Apply normalization (lowercasing, removing spaces, filtering by character type) before counting to match the specific definition of "unique" that your use case requires.
- And when working with Unicode text, remember that accented characters and special symbols are treated as distinct code points unless you explicitly normalize them.