How to Extract a String Between Two Substrings in Python
Extracting a portion of text that lies between two specific substrings is a common task in Python - whether you're parsing log files, scraping HTML content, processing configuration data, or working with templated strings. For example, given the string "Hello [World]!" with delimiters [ and ], the goal is to extract "World".
In this guide, you'll learn three effective methods to extract a string between two substrings in Python, along with edge case handling and best practices.
Using find() for Index-Based Extraction
The str.find() method returns the index of a substring within a string (or -1 if not found). By locating the positions of both delimiters, you can use slicing to extract the content between them.
s = "Hello [world]!"
start = "["
end = "]"
# Find the index of the start delimiter
idx1 = s.find(start)
# Find the index of the end delimiter, searching after the start delimiter
idx2 = s.find(end, idx1 + len(start))
# Extract the substring if both delimiters are found
if idx1 != -1 and idx2 != -1:
result = s[idx1 + len(start):idx2]
print(result)
else:
print("Delimiters not found")
Output:
world
How it works:
s.find(start)locates the position of[in the string.s.find(end, idx1 + len(start))locates]starting the search after the opening delimiter.- Slicing with
s[idx1 + len(start):idx2]extracts everything between the two delimiters, excluding the delimiters themselves. - The
ifcheck ensures both delimiters exist before attempting extraction.
This approach also works seamlessly with multi-character delimiters:
s = "The price is START100.50END dollars"
start = "START"
end = "END"
idx1 = s.find(start)
idx2 = s.find(end, idx1 + len(start))
if idx1 != -1 and idx2 != -1:
result = s[idx1 + len(start):idx2]
print(result)
Output:
100.50
find()This method is ideal when you need a lightweight solution with no imports and your delimiters are simple, fixed strings. It gives you full control over index calculations.
Using Regular Expressions (re Module)
The re module provides powerful pattern matching capabilities that make extraction concise and flexible, especially when delimiters contain special characters or patterns are complex.
import re
s = "Hello [world]!"
# Pattern: match content between [ and ] (non-greedy)
pattern = r"\[(.*?)\]"
match = re.search(pattern, s)
if match:
result = match.group(1)
print(result)
else:
print("Delimiters not found")
Output:
world
How it works:
\[and\]match the literal bracket characters (escaped because[and]are special in regex).(.*?)is a non-greedy capture group that matches the shortest possible string between the delimiters.re.search()finds the first occurrence of the pattern.match.group(1)returns the content captured by the first group.
Extracting All Occurrences
If your string contains multiple pairs of delimiters, use re.findall() to extract all matches at once:
import re
s = "Items: [apple], [banana], [cherry]"
pattern = r"\[(.*?)\]"
results = re.findall(pattern, s)
print(results)
Output:
['apple', 'banana', 'cherry']
Greedy vs. Non-Greedy Matching
Understanding the difference between greedy (.*) and non-greedy (.*?) matching is critical to getting correct results.
Common mistake: using greedy matching
import re
s = "Start [first] and [second] End"
# Greedy: matches as much as possible
pattern = r"\[(.*)\]"
match = re.search(pattern, s)
if match:
print(match.group(1))
Output:
first] and [second
The greedy .* consumed everything between the first [ and the last ], which is usually not what you want.
Correct approach: use non-greedy matching
import re
s = "Start [first] and [second] End"
# Non-greedy: matches as little as possible
pattern = r"\[(.*?)\]"
match = re.search(pattern, s)
if match:
print(match.group(1))
Output:
first
If your delimiters include characters that are special in regex (like ., *, +, (, ), [, ]), make sure to escape them with \ or use re.escape() to build the pattern dynamically:
import re
start = "[["
end = "]]"
pattern = re.escape(start) + r"(.*?)" + re.escape(end)
Using split() for Simple Extraction
The str.split() method offers the most concise approach when you're confident both delimiters exist in the string.
s = "Hello [world]!"
# Split at the first occurrence of "[", then split the remainder at "]"
result = s.split("[", 1)[1].split("]", 1)[0]
print(result)
Output:
world
How it works:
s.split("[", 1)splits the string into at most two parts at the first[, producing["Hello ", "world]!"].[1]selects the second part:"world]!"..split("]", 1)[0]splits that part at]and takes the first element:"world".
Common Mistake: Missing Delimiters Cause Errors
The split() method does not raise an error when a delimiter is missing, but it can return incorrect results silently:
s = "Hello world!"
# The delimiter "[" doesn't exist in the string
parts = s.split("[", 1)
print(parts)
Output:
['Hello world!']
Since there's only one element in the list, accessing [1] would raise an IndexError. Always validate first:
s = "Hello world!"
start = "["
end = "]"
if start in s and end in s:
result = s.split(start, 1)[1].split(end, 1)[0]
print(result)
else:
print("Delimiters not found")
Output:
Delimiters not found
Creating a Reusable Function
For production code, wrapping the logic in a reusable function with proper error handling is a best practice:
def extract_between(text: str, start: str, end: str) -> str | None:
"""Extract the substring between two delimiters.
Returns None if either delimiter is not found.
"""
idx1 = text.find(start)
if idx1 == -1:
return None
idx2 = text.find(end, idx1 + len(start))
if idx2 == -1:
return None
return text[idx1 + len(start):idx2]
# Usage examples
print(extract_between("Hello [world]!", "[", "]"))
print(extract_between("No delimiters here", "[", "]"))
print(extract_between("key=START_value_END!", "START_", "_END"))
Output:
world
None
value
Quick Comparison of Methods
| Method | Best For | Handles Multiple Matches | Requires Import |
|---|---|---|---|
find() | Simple, explicit extraction | No (manual loop needed) | No |
re.search() / re.findall() | Complex patterns, multiple matches | Yes (findall) | Yes (re) |
split() | Quick one-liners | No | No |
Conclusion
Python provides multiple ways to extract a string between two substrings:
find()gives you fine-grained control with index-based slicing - great for straightforward cases with no external dependencies.- Regular expressions (
remodule) offer the most flexibility, especially for complex patterns, special characters, or extracting multiple matches. split()delivers the most concise syntax for simple, one-off extractions.
For robust production code, always handle the case where one or both delimiters are missing to avoid unexpected errors. When working with multiple occurrences, prefer re.findall() for clean and efficient results.