Skip to main content

How to Count Substring Occurrences in a Python List

When working with a list of strings, you often need to know how many of those strings contain a particular substring, or how many times that substring appears in total across every string in the list. These are two distinct questions that require slightly different approaches.

In this guide, you will learn how to count substring occurrences in a Python list using generator expressions, list comprehensions, and the built-in .count() method. Each technique is demonstrated with clear examples and output, including case-insensitive matching, positional tracking, and multi-substring counting.

Counting How Many Items Contain a Substring

To find how many list items contain a specific substring at least once, use a generator expression with sum():

data = ["spam", "ham", "spam and eggs", "green eggs"]
target = "spam"

count = sum(1 for item in data if target in item)

print(count)

Output:

2
note

This counts each matching item once, regardless of how many times the substring appears within that item. The in operator checks for the presence of the substring and returns True or False.

Counting Total Substring Occurrences Across All Items

When a single string might contain the substring multiple times and you want the total count, sum up the .count() results from every item:

data = ["spam spam bacon", "ham", "spam eggs spam"]
target = "spam"

total = sum(item.count(target) for item in data)

print(total)

Output:

4

The breakdown is:

  • "spam spam bacon" contains 2 occurrences
  • "ham" contains 0 occurrences
  • "spam eggs spam" contains 2 occurrences
  • Total: 4

Understanding the Difference Between the Two Approaches

It is important to recognize when each method gives a different result. Using the wrong one can lead to incorrect counts:

data = ["aaa", "a", "bb"]
target = "a"

# Counts items that contain "a" at least once
item_count = sum(1 for item in data if target in item)

# Counts every occurrence of "a" across all items
total_count = sum(item.count(target) for item in data)

print(f"Items containing 'a': {item_count}")
print(f"Total occurrences of 'a': {total_count}")

Output:

Items containing 'a': 2
Total occurrences of 'a': 4
note

The string "aaa" contains the letter "a" three times, but it is only one item. Choose the counting method that matches your actual question.

Counting Exact Full Matches

If you need to count items that are exactly equal to a target string rather than items that merely contain it as a substring, use the list's .count() method:

data = ["spam", "spam and eggs", "spam", "ham"]
target = "spam"

exact_matches = data.count(target)
substring_matches = sum(1 for item in data if target in item)

print(f"Exact matches: {exact_matches}")
print(f"Substring matches: {substring_matches}")

Output:

Exact matches: 2
Substring matches: 3
note

"spam and eggs" contains "spam" as a substring but is not an exact match. The .count() method uses equality comparison, while the in operator checks for substring presence.

Case-Insensitive Counting

To treat uppercase and lowercase letters as equivalent, normalize both the items and the target before comparing:

data = ["Spam", "SPAM recipe", "spam", "Ham"]
target = "spam"

# Count items containing substring (case-insensitive)
item_count = sum(1 for item in data if target.casefold() in item.casefold())

# Count total occurrences (case-insensitive)
total_count = sum(item.casefold().count(target.casefold()) for item in data)

print(f"Items matching: {item_count}")
print(f"Total occurrences: {total_count}")

Output:

Items matching: 3
Total occurrences: 3
tip

Prefer .casefold() over .lower() for case-insensitive comparisons. While both work identically for English text, .casefold() handles special Unicode characters correctly. For example, the German "ß" is converted to "ss" by .casefold(), whereas .lower() leaves it unchanged.

Retrieving the Matching Items

When you need both the count and the actual matching strings for further processing, use a list comprehension:

data = ["error: disk full", "info: started", "error: timeout", "warning: low memory"]
target = "error"

matches = [item for item in data if target in item]

print(f"Found {len(matches)} matches:")
for match in matches:
print(f" - {match}")

Output:

Found 2 matches:
- error: disk full
- error: timeout
note

This approach builds a list of the matching items, so you can iterate over them, pass them to other functions, or inspect them individually.

Tracking Occurrence Counts Per Item

To see exactly how many times the substring appears in each matching item along with its position in the list, build a structured result:

def find_all_positions(data, target):
results = []
for idx, item in enumerate(data):
count = item.count(target)
if count > 0:
results.append({
"index": idx,
"item": item,
"occurrences": count
})
return results

data = ["no match", "spam spam", "also spam"]
positions = find_all_positions(data, "spam")

for entry in positions:
print(entry)

Output:

{'index': 1, 'item': 'spam spam', 'occurrences': 2}
{'index': 2, 'item': 'also spam', 'occurrences': 1}
note

This is useful when you need a detailed breakdown rather than just a single total number.

Counting Multiple Substrings at Once

To count occurrences of several different substrings in a single pass through the data, combine a loop with Counter:

from collections import Counter

data = ["apple pie", "banana bread", "apple tart", "cherry pie"]
targets = ["apple", "pie", "banana"]

counts = Counter()
for target in targets:
counts[target] = sum(item.count(target) for item in data)

print(counts)

Output:

Counter({'apple': 2, 'pie': 2, 'banana': 1})
note

This gives you a complete frequency map for all your target substrings, making it easy to compare their relative frequencies.

Method Comparison

GoalCodeReturns
Items containing substringsum(1 for s in data if sub in s)Count of matching items
Total occurrences across all itemssum(s.count(sub) for s in data)Sum of all occurrences
Exact full matchesdata.count(target)Items exactly equal to target
Retrieve matching items[s for s in data if sub in s]List of matching strings
Multiple substringsCounter with loopFrequency map for each target
note

The sum(1 for ...) pattern is the idiomatic Python way to count items matching a condition without building an intermediate list. It is memory-efficient because the generator yields one value at a time rather than constructing a full list in memory.

Conclusion

When counting substring occurrences in a Python list, first clarify what you are actually counting.

  • Use sum(1 for item in data if target in item) to count how many items contain the substring at least once.
  • Use sum(item.count(target) for item in data) to count every individual occurrence across all items.
  • Use data.count(target) when you need exact full-string matches rather than substring checks.
  • For case-insensitive matching, normalize with .casefold() before comparing.
  • And when you need detailed results or the matching items themselves, switch to a list comprehension or a custom function that captures both counts and context.