Skip to main content

How to Extract a String Between Two Substrings in Python

Extracting a portion of text that lies between two specific substrings is a common task in Python - whether you're parsing log files, scraping HTML content, processing configuration data, or working with templated strings. For example, given the string "Hello [World]!" with delimiters [ and ], the goal is to extract "World".

In this guide, you'll learn three effective methods to extract a string between two substrings in Python, along with edge case handling and best practices.

Using find() for Index-Based Extraction

The str.find() method returns the index of a substring within a string (or -1 if not found). By locating the positions of both delimiters, you can use slicing to extract the content between them.

s = "Hello [world]!"

start = "["
end = "]"

# Find the index of the start delimiter
idx1 = s.find(start)

# Find the index of the end delimiter, searching after the start delimiter
idx2 = s.find(end, idx1 + len(start))

# Extract the substring if both delimiters are found
if idx1 != -1 and idx2 != -1:
result = s[idx1 + len(start):idx2]
print(result)
else:
print("Delimiters not found")

Output:

world

How it works:

  1. s.find(start) locates the position of [ in the string.
  2. s.find(end, idx1 + len(start)) locates ] starting the search after the opening delimiter.
  3. Slicing with s[idx1 + len(start):idx2] extracts everything between the two delimiters, excluding the delimiters themselves.
  4. The if check ensures both delimiters exist before attempting extraction.

This approach also works seamlessly with multi-character delimiters:

s = "The price is START100.50END dollars"

start = "START"
end = "END"

idx1 = s.find(start)
idx2 = s.find(end, idx1 + len(start))

if idx1 != -1 and idx2 != -1:
result = s[idx1 + len(start):idx2]
print(result)

Output:

100.50
When to use find()

This method is ideal when you need a lightweight solution with no imports and your delimiters are simple, fixed strings. It gives you full control over index calculations.

Using Regular Expressions (re Module)

The re module provides powerful pattern matching capabilities that make extraction concise and flexible, especially when delimiters contain special characters or patterns are complex.

import re

s = "Hello [world]!"

# Pattern: match content between [ and ] (non-greedy)
pattern = r"\[(.*?)\]"

match = re.search(pattern, s)

if match:
result = match.group(1)
print(result)
else:
print("Delimiters not found")

Output:

world

How it works:

  1. \[ and \] match the literal bracket characters (escaped because [ and ] are special in regex).
  2. (.*?) is a non-greedy capture group that matches the shortest possible string between the delimiters.
  3. re.search() finds the first occurrence of the pattern.
  4. match.group(1) returns the content captured by the first group.

Extracting All Occurrences

If your string contains multiple pairs of delimiters, use re.findall() to extract all matches at once:

import re

s = "Items: [apple], [banana], [cherry]"

pattern = r"\[(.*?)\]"
results = re.findall(pattern, s)

print(results)

Output:

['apple', 'banana', 'cherry']

Greedy vs. Non-Greedy Matching

Understanding the difference between greedy (.*) and non-greedy (.*?) matching is critical to getting correct results.

Common mistake: using greedy matching

import re

s = "Start [first] and [second] End"

# Greedy: matches as much as possible
pattern = r"\[(.*)\]"
match = re.search(pattern, s)

if match:
print(match.group(1))

Output:

first] and [second
note

The greedy .* consumed everything between the first [ and the last ], which is usually not what you want.

Correct approach: use non-greedy matching

import re

s = "Start [first] and [second] End"

# Non-greedy: matches as little as possible
pattern = r"\[(.*?)\]"
match = re.search(pattern, s)

if match:
print(match.group(1))

Output:

first
Escape special regex characters

If your delimiters include characters that are special in regex (like ., *, +, (, ), [, ]), make sure to escape them with \ or use re.escape() to build the pattern dynamically:

import re

start = "[["
end = "]]"
pattern = re.escape(start) + r"(.*?)" + re.escape(end)

Using split() for Simple Extraction

The str.split() method offers the most concise approach when you're confident both delimiters exist in the string.

s = "Hello [world]!"

# Split at the first occurrence of "[", then split the remainder at "]"
result = s.split("[", 1)[1].split("]", 1)[0]

print(result)

Output:

world

How it works:

  1. s.split("[", 1) splits the string into at most two parts at the first [, producing ["Hello ", "world]!"].
  2. [1] selects the second part: "world]!".
  3. .split("]", 1)[0] splits that part at ] and takes the first element: "world".

Common Mistake: Missing Delimiters Cause Errors

The split() method does not raise an error when a delimiter is missing, but it can return incorrect results silently:

s = "Hello world!"

# The delimiter "[" doesn't exist in the string
parts = s.split("[", 1)
print(parts)

Output:

['Hello world!']

Since there's only one element in the list, accessing [1] would raise an IndexError. Always validate first:

s = "Hello world!"

start = "["
end = "]"

if start in s and end in s:
result = s.split(start, 1)[1].split(end, 1)[0]
print(result)
else:
print("Delimiters not found")

Output:

Delimiters not found

Creating a Reusable Function

For production code, wrapping the logic in a reusable function with proper error handling is a best practice:

def extract_between(text: str, start: str, end: str) -> str | None:
"""Extract the substring between two delimiters.

Returns None if either delimiter is not found.
"""
idx1 = text.find(start)
if idx1 == -1:
return None

idx2 = text.find(end, idx1 + len(start))
if idx2 == -1:
return None

return text[idx1 + len(start):idx2]


# Usage examples
print(extract_between("Hello [world]!", "[", "]"))
print(extract_between("No delimiters here", "[", "]"))
print(extract_between("key=START_value_END!", "START_", "_END"))

Output:

world
None
value

Quick Comparison of Methods

MethodBest ForHandles Multiple MatchesRequires Import
find()Simple, explicit extractionNo (manual loop needed)No
re.search() / re.findall()Complex patterns, multiple matchesYes (findall)Yes (re)
split()Quick one-linersNoNo

Conclusion

Python provides multiple ways to extract a string between two substrings:

  • find() gives you fine-grained control with index-based slicing - great for straightforward cases with no external dependencies.
  • Regular expressions (re module) offer the most flexibility, especially for complex patterns, special characters, or extracting multiple matches.
  • split() delivers the most concise syntax for simple, one-off extractions.

For robust production code, always handle the case where one or both delimiters are missing to avoid unexpected errors. When working with multiple occurrences, prefer re.findall() for clean and efficient results.