How to Count Substring Occurrences in a Python List
When working with a list of strings, you often need to know how many of those strings contain a particular substring, or how many times that substring appears in total across every string in the list. These are two distinct questions that require slightly different approaches.
In this guide, you will learn how to count substring occurrences in a Python list using generator expressions, list comprehensions, and the built-in .count() method. Each technique is demonstrated with clear examples and output, including case-insensitive matching, positional tracking, and multi-substring counting.
Counting How Many Items Contain a Substring
To find how many list items contain a specific substring at least once, use a generator expression with sum():
data = ["spam", "ham", "spam and eggs", "green eggs"]
target = "spam"
count = sum(1 for item in data if target in item)
print(count)
Output:
2
This counts each matching item once, regardless of how many times the substring appears within that item. The in operator checks for the presence of the substring and returns True or False.
Counting Total Substring Occurrences Across All Items
When a single string might contain the substring multiple times and you want the total count, sum up the .count() results from every item:
data = ["spam spam bacon", "ham", "spam eggs spam"]
target = "spam"
total = sum(item.count(target) for item in data)
print(total)
Output:
4
The breakdown is:
"spam spam bacon"contains 2 occurrences"ham"contains 0 occurrences"spam eggs spam"contains 2 occurrences- Total: 4
Understanding the Difference Between the Two Approaches
It is important to recognize when each method gives a different result. Using the wrong one can lead to incorrect counts:
data = ["aaa", "a", "bb"]
target = "a"
# Counts items that contain "a" at least once
item_count = sum(1 for item in data if target in item)
# Counts every occurrence of "a" across all items
total_count = sum(item.count(target) for item in data)
print(f"Items containing 'a': {item_count}")
print(f"Total occurrences of 'a': {total_count}")
Output:
Items containing 'a': 2
Total occurrences of 'a': 4
The string "aaa" contains the letter "a" three times, but it is only one item. Choose the counting method that matches your actual question.
Counting Exact Full Matches
If you need to count items that are exactly equal to a target string rather than items that merely contain it as a substring, use the list's .count() method:
data = ["spam", "spam and eggs", "spam", "ham"]
target = "spam"
exact_matches = data.count(target)
substring_matches = sum(1 for item in data if target in item)
print(f"Exact matches: {exact_matches}")
print(f"Substring matches: {substring_matches}")
Output:
Exact matches: 2
Substring matches: 3
"spam and eggs" contains "spam" as a substring but is not an exact match. The .count() method uses equality comparison, while the in operator checks for substring presence.
Case-Insensitive Counting
To treat uppercase and lowercase letters as equivalent, normalize both the items and the target before comparing:
data = ["Spam", "SPAM recipe", "spam", "Ham"]
target = "spam"
# Count items containing substring (case-insensitive)
item_count = sum(1 for item in data if target.casefold() in item.casefold())
# Count total occurrences (case-insensitive)
total_count = sum(item.casefold().count(target.casefold()) for item in data)
print(f"Items matching: {item_count}")
print(f"Total occurrences: {total_count}")
Output:
Items matching: 3
Total occurrences: 3
Prefer .casefold() over .lower() for case-insensitive comparisons. While both work identically for English text, .casefold() handles special Unicode characters correctly. For example, the German "ß" is converted to "ss" by .casefold(), whereas .lower() leaves it unchanged.
Retrieving the Matching Items
When you need both the count and the actual matching strings for further processing, use a list comprehension:
data = ["error: disk full", "info: started", "error: timeout", "warning: low memory"]
target = "error"
matches = [item for item in data if target in item]
print(f"Found {len(matches)} matches:")
for match in matches:
print(f" - {match}")
Output:
Found 2 matches:
- error: disk full
- error: timeout
This approach builds a list of the matching items, so you can iterate over them, pass them to other functions, or inspect them individually.
Tracking Occurrence Counts Per Item
To see exactly how many times the substring appears in each matching item along with its position in the list, build a structured result:
def find_all_positions(data, target):
results = []
for idx, item in enumerate(data):
count = item.count(target)
if count > 0:
results.append({
"index": idx,
"item": item,
"occurrences": count
})
return results
data = ["no match", "spam spam", "also spam"]
positions = find_all_positions(data, "spam")
for entry in positions:
print(entry)
Output:
{'index': 1, 'item': 'spam spam', 'occurrences': 2}
{'index': 2, 'item': 'also spam', 'occurrences': 1}
This is useful when you need a detailed breakdown rather than just a single total number.
Counting Multiple Substrings at Once
To count occurrences of several different substrings in a single pass through the data, combine a loop with Counter:
from collections import Counter
data = ["apple pie", "banana bread", "apple tart", "cherry pie"]
targets = ["apple", "pie", "banana"]
counts = Counter()
for target in targets:
counts[target] = sum(item.count(target) for item in data)
print(counts)
Output:
Counter({'apple': 2, 'pie': 2, 'banana': 1})
This gives you a complete frequency map for all your target substrings, making it easy to compare their relative frequencies.
Method Comparison
| Goal | Code | Returns |
|---|---|---|
| Items containing substring | sum(1 for s in data if sub in s) | Count of matching items |
| Total occurrences across all items | sum(s.count(sub) for s in data) | Sum of all occurrences |
| Exact full matches | data.count(target) | Items exactly equal to target |
| Retrieve matching items | [s for s in data if sub in s] | List of matching strings |
| Multiple substrings | Counter with loop | Frequency map for each target |
The sum(1 for ...) pattern is the idiomatic Python way to count items matching a condition without building an intermediate list. It is memory-efficient because the generator yields one value at a time rather than constructing a full list in memory.
Conclusion
When counting substring occurrences in a Python list, first clarify what you are actually counting.
- Use
sum(1 for item in data if target in item)to count how many items contain the substring at least once. - Use
sum(item.count(target) for item in data)to count every individual occurrence across all items. - Use
data.count(target)when you need exact full-string matches rather than substring checks. - For case-insensitive matching, normalize with
.casefold()before comparing. - And when you need detailed results or the matching items themselves, switch to a list comprehension or a custom function that captures both counts and context.