How to Find All Overlapping Substrings of a String in Python

When working with strings in Python, you may need to extract every possible overlapping substring grouped by length. This is useful in text analysis, pattern matching, bioinformatics (e.g., analyzing DNA sequences), and computational linguistics where sliding window operations over strings are required.

In this guide, you'll learn how to generate all combinations of overlapping substrings from a string, organized by substring length, with clear explanations and practical examples.

Understanding the Problem

Given a string, the goal is to produce a list of lists where each inner list contains all overlapping substrings of a specific length. The substrings are generated using a sliding window approach.

For example, with the string "ABCDE":

Window Size	Substrings
0	`['', '', '', '', '', '']`
1	`['A', 'B', 'C', 'D', 'E']`
2	`['AB', 'BC', 'CD', 'DE']`
3	`['ABC', 'BCD', 'CDE']`
4	`['ABCD', 'BCDE']`
5	`['ABCDE']`

Notice that as the window size increases, the number of substrings decreases. For a string of length n and window size w, there are n - w + 1 substrings.

Using List Comprehension with String Slicing

The most straightforward approach combines a for loop to control the window size and list comprehension with slicing to extract substrings at each size.

text = "ABCDEFG"

result = []
for size in range(len(text) + 1):
    substrings = [text[i:i + size] for i in range(len(text) - size + 1)]
    result.append(substrings)

print(result)

Output:

[['', '', '', '', '', '', '', ''], ['A', 'B', 'C', 'D', 'E', 'F', 'G'], ['AB', 'BC', 'CD', 'DE', 'EF', 'FG'], ['ABC', 'BCD', 'CDE', 'DEF', 'EFG'], ['ABCD', 'BCDE', 'CDEF', 'DEFG'], ['ABCDE', 'BCDEF', 'CDEFG'], ['ABCDEF', 'BCDEFG'], ['ABCDEFG']]

How it works:

The outer loop iterates size from 0 to the length of the string (inclusive).
For each size, the list comprehension slides a window across the string: text[i:i + size] extracts a substring of length size starting at index i.
range(len(text) - size + 1) ensures the window doesn't extend past the end of the string.
Each group of same-length substrings is appended to the result list.

Using Nested List Comprehension

You can condense the entire operation into a single nested list comprehension for a more compact solution:

text = "ABCDEFG"

result = [
    [text[i:i + size] for i in range(len(text) - size + 1)]
    for size in range(len(text) + 1)
]

print(result)

Output:

[['', '', '', '', '', '', '', ''], ['A', 'B', 'C', 'D', 'E', 'F', 'G'], ['AB', 'BC', 'CD', 'DE', 'EF', 'FG'], ['ABC', 'BCD', 'CDE', 'DEF', 'EFG'], ['ABCD', 'BCDE', 'CDEF', 'DEFG'], ['ABCDE', 'BCDEF', 'CDEFG'], ['ABCDEF', 'BCDEFG'], ['ABCDEFG']]

This produces the same result in fewer lines. The outer comprehension controls the window size, and the inner comprehension generates the substrings.

When to choose which style

Use the explicit loop when you need to add extra logic (like filtering or transforming substrings). Use the nested comprehension when brevity matters and the logic is straightforward.

Excluding Empty Substrings

The window size of 0 produces a list of empty strings, which is often not useful. You can skip it by starting the range at 1:

text = "ABCDE"

result = [
    [text[i:i + size] for i in range(len(text) - size + 1)]
    for size in range(1, len(text) + 1)
]

for group in result:
    print(group)

Output:

['A', 'B', 'C', 'D', 'E']
['AB', 'BC', 'CD', 'DE']
['ABC', 'BCD', 'CDE']
['ABCD', 'BCDE']
['ABCDE']

Creating a Reusable Function

For cleaner, reusable code, wrap the logic in a function with options to include or exclude empty substrings:

def get_overlapping_substrings(text: str, include_empty: bool = False) -> list[list[str]]:
    """Generate all overlapping substrings grouped by length.

    Args:
        text: The input string.
        include_empty: Whether to include the group of empty strings (size 0).

    Returns:
        A list of lists, where each inner list contains substrings of the same length.
    """
    start = 0 if include_empty else 1
    return [
        [text[i:i + size] for i in range(len(text) - size + 1)]
        for size in range(start, len(text) + 1)
    ]


# Usage
result = get_overlapping_substrings("Code")
for group in result:
    print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']
['Code']

Generating a Flat List of All Substrings

If you need all overlapping substrings in a single flat list instead of grouped by length, you can flatten the result:

text = "Code"

all_substrings = [
    text[i:i + size]
    for size in range(1, len(text) + 1)
    for i in range(len(text) - size + 1)
]

print(all_substrings)

Output:

['C', 'o', 'd', 'e', 'Co', 'od', 'de', 'Cod', 'ode', 'Code']

Total number of substrings

For a string of length n, the total number of non-empty substrings is n × (n + 1) / 2. For "Code" (length 4), that's 4 × 5 / 2 = 10 substrings.

Common Mistake: Off-by-One Errors in Range Boundaries

A frequent mistake is using incorrect range boundaries, which either misses the full-length substring or causes index errors.

Wrong approach - missing the full string:

text = "Code"

# Using range(len(text)) instead of range(len(text) + 1)
result = [
    [text[i:i + size] for i in range(len(text) - size + 1)]
    for size in range(1, len(text))  # Missing size == len(text)
]

for group in result:
    print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']

The full string "Code" is missing because range(1, len(text)) stops at 3, never reaching the window size of 4.

Correct approach - use len(text) + 1:

text = "Code"

result = [
    [text[i:i + size] for i in range(len(text) - size + 1)]
    for size in range(1, len(text) + 1)  # Includes the full string
]

for group in result:
    print(group)

Output:

['C', 'o', 'd', 'e']
['Co', 'od', 'de']
['Cod', 'ode']
['Code']

Performance Considerations

The algorithm has a time complexity of O(n²) because for each window size (up to n), it generates a decreasing number of substrings. The total work is proportional to n + (n-1) + (n-2) + ... + 1 = n(n+1)/2.

String Length	Total Substrings	Approximate Operations
10	55	~55
100	5,050	~5,050
1,000	500,500	~500,500
10,000	50,005,000	~50 million

Memory usage with large strings

Each substring is a new string object in Python. For very long strings, generating all overlapping substrings can consume significant memory. Consider using generators instead of lists if you're processing substrings one at a time:

def overlapping_substrings_gen(text):
    for size in range(1, len(text) + 1):
        for i in range(len(text) - size + 1):
            yield text[i:i + size]

# Process substrings one at a time without storing all in memory
for substr in overlapping_substrings_gen("LargeString"):
    pass  # process each substring

Conclusion

Generating all overlapping substrings of a string in Python is straightforward with list comprehension and string slicing:

Use a loop with list comprehension for clarity and the ability to add extra logic.
Use nested list comprehension for a concise one-liner.
Start the range at 1 instead of 0 to exclude empty substrings.
Use a generator for memory-efficient processing of large strings.
Watch out for off-by-one errors - use range(len(text) + 1) to include the full string.

The algorithm runs in O(n²) time, which is efficient for typical string lengths but should be used carefully with very long strings.

Understanding the Problem​

Using List Comprehension with String Slicing​

Using Nested List Comprehension​

Excluding Empty Substrings​

Creating a Reusable Function​

Generating a Flat List of All Substrings​

Common Mistake: Off-by-One Errors in Range Boundaries​

Performance Considerations​

Conclusion​

Table of Contents

Understanding the Problem

Using List Comprehension with String Slicing

Using Nested List Comprehension

Excluding Empty Substrings

Creating a Reusable Function

Generating a Flat List of All Substrings

Common Mistake: Off-by-One Errors in Range Boundaries

Performance Considerations

Conclusion