How to Build a Word Frequency Dictionary in Python

Word frequency dictionaries are foundational for text analysis, sentiment detection, keyword extraction, and search engines. Python's collections.Counter provides a high-performance, readable solution.

The Industry Standard: collections.Counter

The Counter class is specifically designed for counting hashable objects:

from collections import Counter

text = "Python is great and Python is fast"

# Split text into words and count
words = text.split()
freq_map = Counter(words)

print(dict(freq_map))

Output:

{'Python': 2, 'is': 2, 'great': 1, 'and': 1, 'fast': 1}

Performance

Counter is implemented in C at the interpreter level, making it significantly faster than manual counting loops for large text volumes.

Real-World Cleaning: Punctuation and Case

In practice, "Python," and "python" should count as the same word:

import re
from collections import Counter

messy_text = "Data, data everywhere; DATA is key!"

# Lowercase and extract only alphanumeric words
clean_words = re.findall(r'\w+', messy_text.lower())

word_counts = Counter(clean_words)
print(dict(word_counts))

Output:

{'data': 3, 'everywhere': 1, 'is': 1, 'key': 1}

Getting Top N Words

Counter provides built-in ranking:

from collections import Counter

text = "the quick brown fox jumps over the lazy dog the fox"
words = text.lower().split()

word_counts = Counter(words)

# Get 3 most common words
top_three = word_counts.most_common(3)
print(top_three)

Output:

[('the', 3), ('fox', 2), ('quick', 1)]

tip

Use .most_common(n) to get the top N words without manual sorting logic.

Alternative: defaultdict

For custom counting logic within loops:

from collections import defaultdict

text = "Python is great and Python is fast"
words = text.split()

freq = defaultdict(int)
for word in words:
    freq[word] += 1

print(dict(freq))

Output:

{'Python': 2, 'is': 2, 'great': 1, 'and': 1, 'fast': 1}

Method Comparison

Method	Speed	Complexity	Use Case
`Counter`	🚀 Fast	O(n)	Standard word counting
`defaultdict(int)`	🚀 Fast	O(n)	Custom counting logic
`.count()` in loop	🐢 Slow	O(n²)	Avoid

The O(n²) Trap

Never use text.count(word) inside a loop:

# ❌ Bad: O(n²) - scans entire text for each unique word
for word in set(words):
    count = words.count(word)  # Full scan each time!

# ✅ Good: O(n) - single pass
freq = Counter(words)

Complete Example

import re
from collections import Counter

def word_frequency(text, top_n=None):
    """Build word frequency dictionary from text."""
    # Clean: lowercase, extract words only
    words = re.findall(r'\b[a-z]+\b', text.lower())
    
    counts = Counter(words)
    
    if top_n:
        return dict(counts.most_common(top_n))
    return dict(counts)

# Usage
article = """
Python is amazing. Python handles data well.
Data science loves Python!
"""

print(word_frequency(article))
# {'python': 3, 'data': 2, 'is': 1, 'amazing': 1, ...}

print(word_frequency(article, top_n=3))
# {'python': 3, 'data': 2, 'is': 1}

Output:

{'python': 3, 'is': 1, 'amazing': 1, 'handles': 1, 'data': 2, 'well': 1, 'science': 1, 'loves': 1}
{'python': 3, 'data': 2, 'is': 1}

Quick Reference

Goal	Code
Basic count	`Counter(words)`
Top N words	`counter.most_common(n)`
Clean text	`re.findall(r'\w+', text.lower())`
To dictionary	`dict(counter)`

Summary

Use collections.Counter for efficient single-pass word counting. Clean text with re.findall() to normalize case and remove punctuation. Use .most_common(n) to get top words without manual sorting. Avoid the O(n²) trap of calling .count() inside loops.

The Industry Standard: collections.Counter​

Real-World Cleaning: Punctuation and Case​

Getting Top N Words​

Alternative: defaultdict​

Method Comparison​

Complete Example​

Quick Reference​

Summary​

Table of Contents

The Industry Standard: collections.Counter

Real-World Cleaning: Punctuation and Case

Getting Top N Words

Alternative: defaultdict

Method Comparison

Complete Example

Quick Reference

Summary