Skip to main content

How to Find All Overlapping Substrings of a String in Python

When working with strings in Python, you may need to extract every possible overlapping substring grouped by length. This is useful in text analysis, pattern matching, bioinformatics (e.g., analyzing DNA sequences), and computational linguistics where sliding window operations over strings are required.

In this guide, you'll learn how to generate all combinations of overlapping substrings from a string, organized by substring length, with clear explanations and practical examples.

Understanding the Problem​

Given a string, the goal is to produce a list of lists where each inner list contains all overlapping substrings of a specific length. The substrings are generated using a sliding window approach.

For example, with the string "ABCDE":

Window SizeSubstrings
0['', '', '', '', '', '']
1['A', 'B', 'C', 'D', 'E']
2['AB', 'BC', 'CD', 'DE']
3['ABC', 'BCD', 'CDE']
4['ABCD', 'BCDE']
5['ABCDE']

Notice that as the window size increases, the number of substrings decreases. For a string of length n and window size w, there are n - w + 1 substrings.

Using List Comprehension with String Slicing​

The most straightforward approach combines a for loop to control the window size and list comprehension with slicing to extract substrings at each size.

text = "ABCDEFG"

result = []
for size in range(len(text) + 1):
substrings = [text[i:i + size] for i in range(len(text) - size + 1)]
result.append(substrings)

print(result)

Output:

[['', '', '', '', '', '', '', ''], ['A', 'B', 'C', 'D', 'E', 'F', 'G'], ['AB', 'BC', 'CD', 'DE', 'EF', 'FG'], ['ABC', 'BCD', 'CDE', 'DEF', 'EFG'], ['ABCD', 'BCDE', 'CDEF', 'DEFG'], ['ABCDE', 'BCDEF', 'CDEFG'], ['ABCDEF', 'BCDEFG'], ['ABCDEFG']]

How it works:

  1. The outer loop iterates size from 0 to the length of the string (inclusive).
  2. For each size, the list comprehension slides a window across the string: text[i:i + size] extracts a substring of length size starting at index i.
  3. range(len(text) - size + 1) ensures the window doesn't extend past the end of the string.
  4. Each group of same-length substrings is appended to the result list.

Using Nested List Comprehension​

You can condense the entire operation into a single nested list comprehension for a more compact solution:

text = "ABCDEFG"

result = [
[text[i:i + size] for i in range(len(text) - size + 1)]
for size in range(len(text) + 1)
]

print(result)

Output:

[['', '', '', '', '', '', '', ''], ['A', 'B', 'C', 'D', 'E', 'F', 'G'], ['AB', 'BC', 'CD', 'DE', 'EF', 'FG'], ['ABC', 'BCD', 'CDE', 'DEF', 'EFG'], ['ABCD', 'BCDE', 'CDEF', 'DEFG'], ['ABCDE', 'BCDEF', 'CDEFG'], ['ABCDEF', 'BCDEFG'], ['ABCDEFG']]

This produces the same result in fewer lines. The outer comprehension controls the window size, and the inner comprehension generates the substrings.

When to choose which style

Use the explicit loop when you need to add extra logic (like filtering or transforming substrings). Use the nested comprehension when brevity matters and the logic is straightforward.

Excluding Empty Substrings​

The window size of 0 produces a list of empty strings, which is often not useful. You can skip it by starting the range at 1:

text = "ABCDE"

result = [
[text[i:i + size] for i in range(len(text) - size + 1)]
for size in range(1, len(text) + 1)
]

for group in result:
print(group)

Output:

['A', 'B', 'C', 'D', 'E']
['AB', 'BC', 'CD', 'DE']
['ABC', 'BCD', 'CDE']
['ABCD', 'BCDE']
['ABCDE']

Creating a Reusable Function​

For cleaner, reusable code, wrap the logic in a function with options to include or exclude empty substrings:

def get_overlapping_substrings(text: str, include_empty: bool = False) -> list[list[str]]:
"""Generate all overlapping substrings grouped by length.

Args:
text: The input string.
include_empty: Whether to include the group of empty strings (size 0).

Returns:
A list of lists, where each inner list contains substrings of the same length.
"""
start = 0 if include_empty else 1
return [
[text[i:i + size] for i in range(len(text) - size + 1)]
for size in range(start, len(text) + 1)
]


# Usage
result = get_overlapping_substrings("Code")
for group in result:
print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']
['Code']

Generating a Flat List of All Substrings​

If you need all overlapping substrings in a single flat list instead of grouped by length, you can flatten the result:

text = "Code"

all_substrings = [
text[i:i + size]
for size in range(1, len(text) + 1)
for i in range(len(text) - size + 1)
]

print(all_substrings)

Output:

['C', 'o', 'd', 'e', 'Co', 'od', 'de', 'Cod', 'ode', 'Code']
Total number of substrings

For a string of length n, the total number of non-empty substrings is n Ɨ (n + 1) / 2. For "Code" (length 4), that's 4 Ɨ 5 / 2 = 10 substrings.

Common Mistake: Off-by-One Errors in Range Boundaries​

A frequent mistake is using incorrect range boundaries, which either misses the full-length substring or causes index errors.

Wrong approach - missing the full string:

text = "Code"

# Using range(len(text)) instead of range(len(text) + 1)
result = [
[text[i:i + size] for i in range(len(text) - size + 1)]
for size in range(1, len(text)) # Missing size == len(text)
]

for group in result:
print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']

The full string "Code" is missing because range(1, len(text)) stops at 3, never reaching the window size of 4.

Correct approach - use len(text) + 1:

text = "Code"

result = [
[text[i:i + size] for i in range(len(text) - size + 1)]
for size in range(1, len(text) + 1) # Includes the full string
]

for group in result:
print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']
['Code']

Performance Considerations​

The algorithm has a time complexity of O(n²) because for each window size (up to n), it generates a decreasing number of substrings. The total work is proportional to n + (n-1) + (n-2) + ... + 1 = n(n+1)/2.

String LengthTotal SubstringsApproximate Operations
1055~55
1005,050~5,050
1,000500,500~500,500
10,00050,005,000~50 million
Memory usage with large strings

Each substring is a new string object in Python. For very long strings, generating all overlapping substrings can consume significant memory. Consider using generators instead of lists if you're processing substrings one at a time:

def overlapping_substrings_gen(text):
for size in range(1, len(text) + 1):
for i in range(len(text) - size + 1):
yield text[i:i + size]

# Process substrings one at a time without storing all in memory
for substr in overlapping_substrings_gen("LargeString"):
pass # process each substring

Conclusion​

Generating all overlapping substrings of a string in Python is straightforward with list comprehension and string slicing:

  • Use a loop with list comprehension for clarity and the ability to add extra logic.
  • Use nested list comprehension for a concise one-liner.
  • Start the range at 1 instead of 0 to exclude empty substrings.
  • Use a generator for memory-efficient processing of large strings.
  • Watch out for off-by-one errors - use range(len(text) + 1) to include the full string.

The algorithm runs in O(n²) time, which is efficient for typical string lengths but should be used carefully with very long strings.