How to Detect Anagrams in Python Strings
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once. For example, "listen" is an anagram of "silent". Detecting anagrams is a common algorithmic problem used to test string manipulation skills and understanding of hash maps.
This guide explores the two most common techniques for detecting anagrams: sorting and character frequency counting, along with strategies for handling real-world data issues like case sensitivity and whitespace.
Understanding Anagram Logic
Two strings are considered anagrams if they satisfy these conditions:
- They have the same length.
- They contain the exact same characters with the exact same frequency.
Examples:
- "secure" ↔ "rescue" (True)
- "rat" ↔ "car" (False)
Method 1: Sorting (The Intuitive Approach)
If two strings are anagrams, sorting their characters alphabetically will result in two identical sequences. This is the easiest method to implement.
Time Complexity: O(N log N) due to sorting.
def is_anagram_sorting(str1, str2):
# sorted() returns a list of characters
# e.g., sorted("silent") -> ['e', 'i', 'l', 'n', 's', 't']
return sorted(str1) == sorted(str2)
word1 = "listen"
word2 = "silent"
word3 = "apple"
print(f"'{word1}' and '{word2}': {is_anagram_sorting(word1, word2)}")
print(f"'{word1}' and '{word3}': {is_anagram_sorting(word1, word3)}")
Output:
'listen' and 'silent': True
'listen' and 'apple': False
Method 2: Frequency Counting (The Efficient Approach)
A more performant approach, especially for longer strings, is to count the occurrences of each character. If the counts for every character match between the two strings, they are anagrams.
Python's collections.Counter makes this trivial.
Time Complexity: O(N) (Linear time).
from collections import Counter
def is_anagram_counter(str1, str2):
# Counter creates a dictionary-like object mapping chars to counts
# e.g., Counter("aabb") -> {'a': 2, 'b': 2}
return Counter(str1) == Counter(str2)
str_a = "triangle"
str_b = "integral"
# ✅ Correct: Comparing frequency maps
result = is_anagram_counter(str_a, str_b)
print(f"Are '{str_a}' and '{str_b}' anagrams? {result}")
Output:
Are 'triangle' and 'integral' anagrams? True
While sorting is often fast enough for small strings, Counter is generally preferred for performance-critical applications involving large datasets because it avoids the overhead of sorting operations.
Practical Example: Grouping Anagrams
A common interview question asks to group a list of words into clusters of anagrams. We can use the sorted string as a dictionary key to group them.
def group_anagrams(words):
anagram_groups = {}
for word in words:
# Create a key by sorting the word
# "listen" -> "eilnst"
sorted_key = "".join(sorted(word))
if sorted_key not in anagram_groups:
anagram_groups[sorted_key] = []
anagram_groups[sorted_key].append(word)
return list(anagram_groups.values())
input_words = ["eat", "tea", "tan", "ate", "nat", "bat"]
grouped = group_anagrams(input_words)
for group in grouped:
print(group)
Output:
['eat', 'tea', 'ate']
['tan', 'nat']
['bat']
Handling Edge Cases (Case & Whitespace)
Real-world text is rarely clean. "Debit card" and "Bad credit" are famous anagrams, but a naive comparison will fail due to spaces and capitalization. You must normalize the strings first.
from collections import Counter
text1 = "Debit card"
text2 = "Bad credit"
# ⛔️ Naive check fails due to space and case
# sorted("Debit card") != sorted("Bad credit")
print(f"Naive check: {Counter(text1) == Counter(text2)}")
# ✅ Correct: Normalize data
def normalize(text):
# Lowercase and remove non-alphanumeric characters (like spaces)
return text.lower().replace(" ", "")
is_anagram = Counter(normalize(text1)) == Counter(normalize(text2))
print(f"Robust check: {is_anagram}")
Output:
Naive check: False
Robust check: True
Conclusion
To detect string anagrams in Python:
- Use
sorted(s1) == sorted(s2)for quick, readable checks on small strings. - Use
Counter(s1) == Counter(s2)for better performance (O(N)) on large strings. - Normalize Inputs by removing spaces and converting to lowercase to handle phrases effectively.