Skip to main content

How to Build a Word Frequency Dictionary in Python

Word frequency dictionaries are foundational for text analysis, sentiment detection, keyword extraction, and search engines. Python's collections.Counter provides a high-performance, readable solution.

The Industry Standard: collections.Counter​

The Counter class is specifically designed for counting hashable objects:

from collections import Counter

text = "Python is great and Python is fast"

# Split text into words and count
words = text.split()
freq_map = Counter(words)

print(dict(freq_map))

Output:

{'Python': 2, 'is': 2, 'great': 1, 'and': 1, 'fast': 1}
Performance

Counter is implemented in C at the interpreter level, making it significantly faster than manual counting loops for large text volumes.

Real-World Cleaning: Punctuation and Case​

In practice, "Python," and "python" should count as the same word:

import re
from collections import Counter

messy_text = "Data, data everywhere; DATA is key!"

# Lowercase and extract only alphanumeric words
clean_words = re.findall(r'\w+', messy_text.lower())

word_counts = Counter(clean_words)
print(dict(word_counts))

Output:

{'data': 3, 'everywhere': 1, 'is': 1, 'key': 1}

Getting Top N Words​

Counter provides built-in ranking:

from collections import Counter

text = "the quick brown fox jumps over the lazy dog the fox"
words = text.lower().split()

word_counts = Counter(words)

# Get 3 most common words
top_three = word_counts.most_common(3)
print(top_three)

Output:

[('the', 3), ('fox', 2), ('quick', 1)]
tip

Use .most_common(n) to get the top N words without manual sorting logic.

Alternative: defaultdict​

For custom counting logic within loops:

from collections import defaultdict

text = "Python is great and Python is fast"
words = text.split()

freq = defaultdict(int)
for word in words:
freq[word] += 1

print(dict(freq))

Output:

{'Python': 2, 'is': 2, 'great': 1, 'and': 1, 'fast': 1}

Method Comparison​

MethodSpeedComplexityUse Case
CounteršŸš€ FastO(n)Standard word counting
defaultdict(int)šŸš€ FastO(n)Custom counting logic
.count() in loop🐢 SlowO(n²)Avoid
The O(n²) Trap

Never use text.count(word) inside a loop:

# āŒ Bad: O(n²) - scans entire text for each unique word
for word in set(words):
count = words.count(word) # Full scan each time!

# āœ… Good: O(n) - single pass
freq = Counter(words)

Complete Example​

import re
from collections import Counter

def word_frequency(text, top_n=None):
"""Build word frequency dictionary from text."""
# Clean: lowercase, extract words only
words = re.findall(r'\b[a-z]+\b', text.lower())

counts = Counter(words)

if top_n:
return dict(counts.most_common(top_n))
return dict(counts)

# Usage
article = """
Python is amazing. Python handles data well.
Data science loves Python!
"""

print(word_frequency(article))
# {'python': 3, 'data': 2, 'is': 1, 'amazing': 1, ...}

print(word_frequency(article, top_n=3))
# {'python': 3, 'data': 2, 'is': 1}

Output:

{'python': 3, 'is': 1, 'amazing': 1, 'handles': 1, 'data': 2, 'well': 1, 'science': 1, 'loves': 1}
{'python': 3, 'data': 2, 'is': 1}

Quick Reference​

GoalCode
Basic countCounter(words)
Top N wordscounter.most_common(n)
Clean textre.findall(r'\w+', text.lower())
To dictionarydict(counter)

Summary​

Use collections.Counter for efficient single-pass word counting. Clean text with re.findall() to normalize case and remove punctuation. Use .most_common(n) to get top words without manual sorting. Avoid the O(n²) trap of calling .count() inside loops.