How to Get All Substrings of a String in Python
A substring is any contiguous sequence of characters within a string. Extracting all possible substrings is a fundamental operation used in text searching, pattern matching, DNA sequence analysis, and string algorithm problems. For example, the string "abc" has the substrings: "a", "ab", "abc", "b", "bc", and "c".
In this guide, you'll learn multiple methods to generate all substrings of a given string in Python, along with explanations of how each approach works and when to use it.
Understanding Substring Generationā
For a string of length n, substrings are defined by a start index i and an end index j where 0 ⤠i < j ⤠n. The total number of non-empty substrings is n à (n + 1) / 2.
For the string "abc" (length 3):
Start (i) | End (j) | Substring |
|---|---|---|
| 0 | 1 | "a" |
| 0 | 2 | "ab" |
| 0 | 3 | "abc" |
| 1 | 2 | "b" |
| 1 | 3 | "bc" |
| 2 | 3 | "c" |
Total: 3 Ć 4 / 2 = 6 substrings.
Using List Comprehension (Recommended)ā
List comprehension provides the most concise and Pythonic way to generate all substrings using nested iteration over start and end indices:
s = "abc"
substrings = [s[i:j] for i in range(len(s)) for j in range(i + 1, len(s) + 1)]
print(substrings)
Output:
['a', 'ab', 'abc', 'b', 'bc', 'c']
How it works:
- The outer loop
irepresents the starting index of each substring (from0tolen(s) - 1). - The inner loop
jrepresents the ending index (fromi + 1tolen(s)), ensuring every substring has at least one character. s[i:j]extracts the substring from indexiup to (but not including) indexj.
List comprehension is concise, idiomatic Python, and performs the same operations as nested loops with less boilerplate. It's the go-to method for most substring generation tasks.
Using Nested for Loopsā
If you prefer explicit control over the iteration or need to add extra logic (like filtering or transforming substrings), nested loops provide maximum clarity:
s = "abc"
substrings = []
for i in range(len(s)):
for j in range(i + 1, len(s) + 1):
substrings.append(s[i:j])
print(substrings)
Output:
['a', 'ab', 'abc', 'b', 'bc', 'c']
This is functionally identical to the list comprehension approach but makes it easy to add conditions or side effects inside the loop:
s = "abc"
substrings = []
for i in range(len(s)):
for j in range(i + 1, len(s) + 1):
sub = s[i:j]
if len(sub) >= 2: # Only keep substrings with 2+ characters
substrings.append(sub)
print(substrings)
Output:
['ab', 'abc', 'bc']
Using itertools.combinations()ā
The itertools.combinations() function generates all pairs of indices, which naturally correspond to substring boundaries:
from itertools import combinations
s = "abc"
substrings = [s[i:j] for i, j in combinations(range(len(s) + 1), 2)]
print(substrings)
Output:
['a', 'ab', 'abc', 'b', 'bc', 'c']
How it works:
range(len(s) + 1)produces indices[0, 1, 2, 3].combinations(..., 2)generates all pairs wherei < j:(0,1), (0,2), (0,3), (1,2), (1,3), (2,3).- Each pair
(i, j)maps directly to the substrings[i:j].
This approach is elegant and clearly expresses the mathematical relationship between index pairs and substrings.
Grouping Substrings by Lengthā
Sometimes you need substrings organized by their length rather than in a flat list. This is useful for pattern analysis or when processing substrings of a specific size:
s = "code"
grouped = [
[s[i:i + length] for i in range(len(s) - length + 1)]
for length in range(1, len(s) + 1)
]
for group in grouped:
print(group)
Output:
['c', 'o', 'd', 'e']
['co', 'od', 'de']
['cod', 'ode']
['code']
Getting Unique Substringsā
If your string contains repeated characters, you may get duplicate substrings. Use a set to keep only unique ones:
s = "abab"
all_substrings = [s[i:j] for i in range(len(s)) for j in range(i + 1, len(s) + 1)]
unique_substrings = sorted(set(all_substrings))
print(f"All substrings ({len(all_substrings)}): {all_substrings}")
print(f"Unique substrings ({len(unique_substrings)}): {unique_substrings}")
Output:
All substrings (10): ['a', 'ab', 'aba', 'abab', 'b', 'ba', 'bab', 'a', 'ab', 'b']
Unique substrings (7): ['a', 'ab', 'aba', 'abab', 'b', 'ba', 'bab']
Common Mistake: Off-by-One Errors in Range Boundariesā
A frequent error is using incorrect range boundaries, resulting in missing substrings or including empty strings.
Wrong approach: missing full-length substring.
s = "abc"
# Using range(len(s)) instead of range(len(s) + 1) for j
substrings = [s[i:j] for i in range(len(s)) for j in range(i + 1, len(s))]
print(substrings)
Output:
['a', 'ab', 'b']
The substrings "abc", "bc", and "c" are missing because j never reaches len(s).
Correct approach: use len(s) + 1.
s = "abc"
substrings = [s[i:j] for i in range(len(s)) for j in range(i + 1, len(s) + 1)]
print(substrings)
Output:
['a', 'ab', 'abc', 'b', 'bc', 'c']
Python's s[i:j] extracts characters from index i up to but not including index j. To include the last character of the string, j must be able to reach len(s), which means the range must go up to len(s) + 1.
Using a Generator for Memory Efficiencyā
For long strings, storing all substrings in a list can consume significant memory. A generator yields substrings one at a time:
def all_substrings(s):
"""Yield all non-empty substrings of the given string."""
for i in range(len(s)):
for j in range(i + 1, len(s) + 1):
yield s[i:j]
s = "Python"
# Process substrings without storing all in memory
count = 0
for sub in all_substrings(s):
count += 1
print(f"Total substrings of '{s}': {count}")
Output:
Total substrings of 'Python': 21
For a string of length n, there are n(n+1)/2 substrings. A string of length 1,000 produces 500,500 substrings. If you're only processing them (for example, searching or counting), a generator avoids storing them all in memory simultaneously.
Creating a Reusable Functionā
A complete, production-ready function with options for filtering and uniqueness:
def get_substrings(
s: str,
min_length: int = 1,
max_length: int | None = None,
unique: bool = False,
) -> list[str]:
"""Generate all substrings of a string with optional filtering.
Args:
s: The input string.
min_length: Minimum substring length (default: 1).
max_length: Maximum substring length (default: length of s).
unique: If True, return only unique substrings.
Returns:
A list of substrings matching the criteria.
"""
if max_length is None:
max_length = len(s)
substrings = [
s[i:j]
for i in range(len(s))
for j in range(i + 1, len(s) + 1)
if min_length <= j - i <= max_length
]
if unique:
# Preserve order while removing duplicates
seen = set()
substrings = [x for x in substrings if not (x in seen or seen.add(x))]
return substrings
# All substrings
print(get_substrings("abc"))
# Only substrings of length 2
print(get_substrings("abc", min_length=2, max_length=2))
# Unique substrings only
print(get_substrings("abab", unique=True))
Output:
['a', 'ab', 'abc', 'b', 'bc', 'c']
['ab', 'bc']
['a', 'ab', 'aba', 'abab', 'b', 'ba', 'bab']
Performance Considerationsā
| String Length | Total Substrings | Notes |
|---|---|---|
| 10 | 55 | Instant |
| 100 | 5,050 | Fast |
| 1,000 | 500,500 | Use generators if possible |
| 10,000 | 50,005,000 | Memory-intensive; generators recommended |
All methods have O(n²) time complexity since generating all substrings requires visiting every valid (i, j) pair. Space complexity is O(n²) when storing results in a list, or O(1) when using a generator.
Quick Comparison of Methodsā
| Method | Conciseness | Flexibility | Best For |
|---|---|---|---|
| List comprehension | āāā High | āā Medium | Most use cases (recommended) |
Nested for loops | āā Medium | āāā High | Adding extra logic during iteration |
itertools.combinations() | āāā High | āā Medium | Mathematically expressive code |
| Generator function | āā Medium | āāā High | Large strings, memory efficiency |
Conclusionā
Generating all substrings of a string in Python is straightforward with the right approach:
- List comprehension is the most Pythonic and recommended method for most scenarios.
- Nested loops offer maximum flexibility for adding filtering or transformation logic.
itertools.combinations()elegantly maps index pairs to substrings.- Generators are essential for large strings where memory efficiency matters.
- Always use
range(i + 1, len(s) + 1)for the end index to avoid missing substrings: remember that Python slicing excludes the upper bound. - Use
set()to eliminate duplicates when the string contains repeated characters.