Skip to main content

How to Count Unique Characters in a String in Python

Counting the number of distinct characters in a string is a common task in text processing, input validation, and data analysis. At its core, this is a set operation: sets automatically eliminate duplicates, so converting a string to a set and measuring its length gives you the unique character count instantly.

In this guide, you will learn multiple ways to count unique characters in a Python string, including simple set-based counting, frequency analysis with Counter, various normalization strategies, and techniques for filtering specific character types.

Using set() for Simple Counting

The most direct and Pythonic approach converts the string to a set, which removes all duplicate characters, and then measures the length of the resulting set:

text = "hello"

unique_count = len(set(text))

print(unique_count)
print(set(text))

Output:

4
{'h', 'e', 'l', 'o'}

The string "hello" has five characters, but only four are distinct. The duplicate "l" is automatically eliminated by the set. This approach runs in O(n) time and is the fastest option when you only need the count.

Using Counter for Frequency Information

When you need both the number of unique characters and how many times each one appears, the Counter class from the collections module provides a complete frequency map:

from collections import Counter

text = "mississippi"

freq = Counter(text)

print(len(freq))
print(freq)
print(list(freq.keys()))

Output:

4
Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
['m', 'i', 's', 'p']

len(freq) gives the unique character count, while freq itself holds the full distribution. This is especially useful when your analysis goes beyond simple counting.

Normalizing Before Counting

Different use cases require different normalization strategies. Spaces, punctuation, and letter casing can all affect what counts as a "unique" character:

text = "Hello World!"

# Count all characters including spaces and punctuation
print(len(set(text)))

# Ignore case
print(len(set(text.lower())))

# Ignore spaces
print(len(set(text.replace(" ", ""))))

# Letters only, case-insensitive
letters_only = [c.lower() for c in text if c.isalpha()]
print(len(set(letters_only)))

Output:

9
9
8
7

Each line applies a different normalization:

  • No normalization: all 9 distinct characters are counted, including the space and exclamation mark.
  • Lowercase only: case is removed, but spaces and punctuation remain.
  • No spaces: the space is excluded, reducing the count by one.
  • Letters only, case-insensitive: only alphabetic characters are kept, and "H" and "h" are treated as the same character.
tip

Use .casefold() instead of .lower() for more aggressive case normalization that handles special Unicode characters correctly. For example, the German "ß" is converted to "ss" by .casefold(), whereas .lower() leaves it unchanged.

Counting Specific Character Types

You can filter the string to count only certain categories of characters using Python's built-in string methods:

text = "Hello123 World456!"

# Unique letters only
unique_letters = len(set(c for c in text if c.isalpha()))
print(f"Unique letters: {unique_letters}")

# Unique digits only
unique_digits = len(set(c for c in text if c.isdigit()))
print(f"Unique digits: {unique_digits}")

# Unique alphanumeric characters
unique_alnum = len(set(c for c in text if c.isalnum()))
print(f"Unique alphanumeric: {unique_alnum}")

Output:

Unique letters: 7
Unique digits: 6
Unique alphanumeric: 13

The generator expression inside set() filters characters before deduplication, so only characters matching the condition are included in the final count.

Finding Characters That Appear Exactly Once

Sometimes "unique" means characters that occur only a single time in the string, not just the set of distinct characters. Use Counter to identify these:

from collections import Counter

text = "abracadabra"

freq = Counter(text)
truly_unique = [char for char, count in freq.items() if count == 1]

print(truly_unique)
print(len(truly_unique))

Output:

['c', 'd']
2

Out of the four distinct characters in "abracadabra" (a, b, r, c, d), only "c" and "d" appear exactly once.

A Common Confusion: Distinct vs. Non-Repeated

It is important to recognize the difference between these two questions:

from collections import Counter

text = "abracadabra"

# Distinct characters (appear at least once)
distinct = len(set(text))
print(f"Distinct characters: {distinct}")

# Non-repeated characters (appear exactly once)
freq = Counter(text)
non_repeated = sum(1 for count in freq.values() if count == 1)
print(f"Non-repeated characters: {non_repeated}")

Output:

Distinct characters: 5
Non-repeated characters: 2

len(set(text)) counts all characters that exist in the string, while the Counter approach identifies only those that never repeat.

Working with Unicode Characters

Python's set() handles Unicode characters correctly, treating each code point as a separate character:

text = "café naïve 日本語"

unique_chars = set(text)

print(len(unique_chars))
print(sorted(unique_chars))

Output:

12
[' ', 'a', 'c', 'e', 'f', 'n', 'v', 'é', 'ï', '日', '本', '語']
note

Each Unicode code point is treated as a separate character. Accented characters like "é" are counted as distinct from "e". If you want to treat them as the same character, you would need to apply Unicode normalization using the unicodedata module before counting.

Comparing Strings by Their Unique Characters

You can check whether two strings are composed of the same set of characters, regardless of order or repetition:

def same_unique_chars(s1, s2):
return set(s1.lower()) == set(s2.lower())

print(same_unique_chars("abc", "cba"))
print(same_unique_chars("aabbcc", "abc"))
print(same_unique_chars("abc", "abcd"))

Output:

True
True
False

This is useful for tasks like checking whether two strings are potential anagrams (though a full anagram check also requires matching frequencies, not just the character set).

Method Comparison

MethodReturnsBest For
len(set(s))Integer countSimple and fast unique counting
len(Counter(s))Integer countWhen frequencies are also needed
set(s)Set of charactersWhen you need the actual unique characters
Counter(s)Frequency dictionaryFull distribution analysis
Filtered set() with generatorInteger countCounting specific character types

Both set() and Counter run in O(n) time complexity, but len(set(text)) is slightly faster and uses less memory when you only need the count without frequency information.

Conclusion

For straightforward unique character counting, len(set(text)) is the fastest and most Pythonic approach.

  • Use Counter when you also need frequency information or want to identify characters that appear exactly once.
  • Apply normalization (lowercasing, removing spaces, filtering by character type) before counting to match the specific definition of "unique" that your use case requires.
  • And when working with Unicode text, remember that accented characters and special symbols are treated as distinct code points unless you explicitly normalize them.