How to Get Word Frequency as Percentage in Python

Calculating the percentage share of each word across a collection of text is a common task in natural language processing (NLP), text analytics, content analysis, and search engine optimization. Instead of just counting how many times a word appears, you express each word's frequency as a proportion of the total word count.

The formula is straightforward: (Occurrences of word / Total words) × 100.

In this guide, you'll learn efficient methods to compute word frequency percentages in Python, with clear examples and best practices for handling real-world text.

Understanding the Problem

Given a list of strings, calculate each unique word's frequency as a fraction (or percentage) of the total number of words across all strings.

Example: For the text ["Python is great", "Python is fun"]:

Word	Count	Total Words	Frequency
Python	2	6	33.33%
is	2	6	33.33%
great	1	6	16.67%
fun	1	6	16.67%

Using `Counter` with `join()` and `split()` (Recommended)

The collections.Counter class is purpose-built for counting occurrences. Combined with join() and split(), it provides the cleanest solution:

from collections import Counter

sentences = [
    "Python is great for data science",
    "Data science is fun",
    "Python is the best for learning",
]

# Join all strings and split into individual words
all_words = " ".join(sentences).split()

# Count word frequencies
word_counts = Counter(all_words)
total_words = sum(word_counts.values())

# Calculate percentage for each word
frequency_pct = {word: (count / total_words) * 100 for word, count in word_counts.items()}

# Display results sorted by frequency (descending)
for word, pct in sorted(frequency_pct.items(), key=lambda x: x[1], reverse=True):
    print(f"  {word:<12} {pct:>6.2f}%")

Output:

  is            18.75%
  Python        12.50%
  for           12.50%
  science       12.50%
  great          6.25%
  data           6.25%
  Data           6.25%
  fun            6.25%
  the            6.25%
  best           6.25%
  learning       6.25%

How it works:

" ".join(sentences) concatenates all strings into a single string separated by spaces.
.split() breaks it into a list of individual words.
Counter() counts how many times each word appears.
Dividing each count by the total gives the frequency as a decimal; multiplying by 100 converts to a percentage.

Why Counter is the best choice

Counter is implemented in C under the hood, making it significantly faster than manually counting with loops. It also provides useful methods like .most_common(n) for retrieving the top N words.

Handling Case Sensitivity

Notice in the output above that "data" and "Data" are treated as separate words. In most text analysis scenarios, you want case-insensitive counting:

from collections import Counter

sentences = [
    "Python is great for data science",
    "Data science is fun",
    "Python is the best for learning",
]

# Normalize to lowercase before splitting
all_words = " ".join(sentences).lower().split()

word_counts = Counter(all_words)
total_words = sum(word_counts.values())

frequency_pct = {word: (count / total_words) * 100 for word, count in word_counts.items()}

for word, pct in sorted(frequency_pct.items(), key=lambda x: x[1], reverse=True):
    print(f"  {word:<12} {pct:>6.2f}%")

Output:

  is            18.75%
  python        12.50%
  for           12.50%
  data          12.50%
  science       12.50%
  great          6.25%
  fun            6.25%
  the            6.25%
  best           6.25%
  learning       6.25%

Now "Data" and "data" are correctly merged into a single entry with a combined frequency of 12.50%.

Common Mistake: Using `list.count()` Inside a Loop

A frequent performance pitfall is calling .count() inside a loop, which rescans the entire list for every unique word:

Wrong approach: O(n²) time complexity

sentences = ["Python is great", "Python is fun"]
all_words = " ".join(sentences).split()

frequency = {}
for word in all_words:
    if word not in frequency:
        # .count() scans the ENTIRE list each time: O(n) per call
        frequency[word] = all_words.count(word) / len(all_words)

print(frequency)

Output:

{'Python': 0.3333333333333333, 'is': 0.3333333333333333, 'great': 0.16666666666666666, 'fun': 0.16666666666666666}

note

This works but is quadratic in time complexity. For a list with 10,000 words, .count() is called for each unique word, and each call scans the entire list.

Correct approach: O(n) with `Counter`

from collections import Counter

sentences = ["Python is great", "Python is fun"]
all_words = " ".join(sentences).split()

word_counts = Counter(all_words)
total = sum(word_counts.values())
frequency = {word: count / total for word, count in word_counts.items()}

print(frequency)

Output:

{'Python': 0.3333333333333333, 'is': 0.3333333333333333, 'great': 0.16666666666666666, 'fun': 0.16666666666666666}

Performance matters at scale

For small datasets, both approaches work fine. But with thousands of sentences or millions of words, the Counter approach is orders of magnitude faster. Always prefer Counter over manual .count() loops.

Creating a Reusable Function

A production-ready function with options for case sensitivity, sorting, and percentage formatting:

from collections import Counter

def word_frequency_percentage(
    texts: list[str],
    case_sensitive: bool = False,
    top_n: int | None = None,
    as_percentage: bool = True,
) -> dict[str, float]:
    """Calculate the frequency percentage of each word across a list of strings.

    Args:
        texts: List of strings to analyze.
        case_sensitive: Whether to treat 'Word' and 'word' as different (default: False).
        top_n: Return only the top N most frequent words (default: all).
        as_percentage: If True, values are percentages (0-100). If False, fractions (0-1).

    Returns:
        Dictionary mapping words to their frequency percentages, sorted descending.
    """
    combined = " ".join(texts)
    if not case_sensitive:
        combined = combined.lower()

    word_counts = Counter(combined.split())
    total_words = sum(word_counts.values())

    if total_words == 0:
        return {}

    multiplier = 100 if as_percentage else 1
    items = word_counts.most_common(top_n)

    return {word: (count / total_words) * multiplier for word, count in items}


# Usage
sentences = [
    "Python is great for data science",
    "Data science is fun",
    "Python is the best for learning",
]

# Top 5 words as percentages
result = word_frequency_percentage(sentences, top_n=5)

print("Top 5 words by frequency:")
for word, pct in result.items():
    print(f"  {word:<12} {pct:.2f}%")

Output:

Top 5 words by frequency:
  is           18.75%
  python       12.50%
  for          12.50%
  data         12.50%
  science      12.50%

Working with Files

You can easily adapt this approach to analyze text files:

from collections import Counter

def analyze_file(filepath: str, top_n: int = 10) -> dict[str, float]:
    """Analyze word frequency percentages in a text file."""
    with open(filepath, "r", encoding="utf-8") as f:
        text = f.read().lower()

    # Remove basic punctuation
    for char in ".,!?;:\"'()-":
        text = text.replace(char, "")

    word_counts = Counter(text.split())
    total = sum(word_counts.values())

    return {
        word: (count / total) * 100
        for word, count in word_counts.most_common(top_n)
    }


# Usage (assuming a file exists)
result = analyze_file("sample.txt", top_n=10)

Visualizing Word Frequencies

For a quick visual representation, you can create a simple bar chart using the results:

from collections import Counter

sentences = [
    "Python is great for data science",
    "Data science is fun with Python",
    "Python is the best language for data analysis",
]

all_words = " ".join(sentences).lower().split()
word_counts = Counter(all_words)
total = sum(word_counts.values())

print("Word Frequency Chart")
print("=" * 50)

for word, count in word_counts.most_common(8):
    pct = (count / total) * 100
    bar = "█" * int(pct * 2)
    print(f"  {word:<12} {bar} {pct:.1f}%")

Output:

Word Frequency Chart
==================================================
  python       ██████████████████████████████ 15.0%
  is           ██████████████████████████████ 15.0%
  data         ██████████████████████████████ 15.0%
  for          ████████████████████ 10.0%
  science      ████████████████████ 10.0%
  great        ██████████ 5.0%
  fun          ██████████ 5.0%
  with         ██████████ 5.0%

Quick Comparison of Methods

Method	Time Complexity	Readability	Best For
`Counter` + dict comprehension	O(n)	⭐⭐⭐ High	Most use cases (recommended)
Manual loop with `Counter`	O(n)	⭐⭐⭐ High	Extra processing during counting
`list.count()` in loop	O(n²)	⭐⭐ Medium	Avoid: inefficient

Conclusion

Calculating word frequency percentages in Python is simple and efficient with the right tools:

Counter from collections is the recommended approach: it's fast, clean, and provides useful methods like .most_common().
Normalize case with .lower() to avoid treating the same word as different entries.
Avoid list.count() inside loops: it creates O(n²) complexity that becomes a bottleneck with large texts.
Use the top_n pattern with .most_common(n) to focus on the most significant words.
For real-world text analysis, consider removing punctuation and stop words (common words like "the", "is", "and") before computing frequencies.

Understanding the Problem​

Using Counter with join() and split() (Recommended)​

Handling Case Sensitivity​

Common Mistake: Using list.count() Inside a Loop​

Wrong approach: O(n²) time complexity​

Correct approach: O(n) with Counter​

Creating a Reusable Function​

Working with Files​

Visualizing Word Frequencies​

Quick Comparison of Methods​

Conclusion​

Table of Contents

Understanding the Problem

Using `Counter` with `join()` and `split()` (Recommended)

Handling Case Sensitivity

Common Mistake: Using `list.count()` Inside a Loop

Wrong approach: O(n²) time complexity

Correct approach: O(n) with `Counter`

Creating a Reusable Function

Working with Files

Visualizing Word Frequencies

Quick Comparison of Methods

Conclusion