Skip to main content

How to Use Pattern Matching With Regex in Python

Regular expressions (regex) are a powerful tool for searching, matching, and manipulating text based on defined patterns. Python's built-in re module provides full regex support, allowing you to validate formats (like phone numbers and emails), extract data from strings, and perform complex text transformations.

In this guide, you will learn the fundamentals of pattern matching with regex in Python: from basic character matching to grouping, quantifiers, and optional patterns, with practical examples and clear explanations.

Getting Started: The Basics

To use regex in Python, import the re module, compile a pattern, and search for matches:

import re

# Step 1: Compile a regex pattern
pattern = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

# Step 2: Search for the pattern in a string
match = pattern.search('My number is 415-555-4242.')

# Step 3: Extract the matched text
if match:
print('Phone number found:', match.group())

Output:

Phone number found: 415-555-4242

Key concepts:

  • re.compile(r'pattern') creates a reusable regex object. The r prefix creates a raw string, preventing Python from interpreting backslashes.
  • .search() scans the string for the first match and returns a Match object (or None if no match).
  • .group() returns the actual matched text.
Why use raw strings?

Without the r prefix, Python interprets \d as an escape sequence. Raw strings (r'\d') pass the backslash directly to the regex engine:

# INCORRECT: without raw string, Python interprets \d
pattern = re.compile('\d\d\d') # Works but may cause issues with other escapes

# CORRECT: with raw string, backslashes are preserved
pattern = re.compile(r'\d\d\d')

Common Regex Symbols

SymbolMeaningExampleMatches
\dAny digit (0-9)\d\d\d415, 123
\wAny word character (letter, digit, underscore)\w+hello_123
\sAny whitespace (space, tab, newline)\s+spaces, tabs
.Any character except newlinea.ba1b, a-b
^Start of string^HelloHello world
$End of stringworld$Hello world
\DAny non-digit\D+abc, xyz
\WAny non-word character\W@, #, -

Using Curly Braces for Repetition

Instead of writing \d\d\d, use curly braces {n} to specify how many times a pattern should repeat:

import re

# Both patterns are equivalent
pattern1 = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
pattern2 = re.compile(r'\d{3}-\d{3}-\d{4}')

text = 'Call me at 415-555-4242.'
print(pattern2.search(text).group())

Output:

415-555-4242

Range Repetition

Curly braces also support ranges:

import re

# Match 'Ha' repeated 3 times
pattern = re.compile(r'(Ha){3}')

print(pattern.search('HaHaHa').group()) # HaHaHa
print(pattern.search('Ha') is None) # True: only 1 repeat

Output:

HaHaHa
True
SyntaxMeaning
{3}Exactly 3 repetitions
{3,5}Between 3 and 5 repetitions
{3,}3 or more repetitions
{,5}0 to 5 repetitions

Grouping With Parentheses

Parentheses () create groups that let you extract specific parts of a match:

import re

pattern = re.compile(r'(\d{3})-(\d{3}-\d{4})')
match = pattern.search('My number is 415-555-4242.')

# Access individual groups
print("Full match:", match.group()) # Entire match
print("Group 1:", match.group(1)) # Area code
print("Group 2:", match.group(2)) # Number

# Unpack all groups at once
area_code, number = match.groups()
print(f"Area code: {area_code}, Number: {number}")

Output:

Full match: 415-555-4242
Group 1: 415
Group 2: 555-4242
Area code: 415, Number: 555-4242

Matching Literal Parentheses

Since parentheses have special meaning in regex, escape them with \ to match actual parentheses in text:

import re

pattern = re.compile(r'\((\d{3})\) (\d{3}-\d{4})')
match = pattern.search('My phone number is (415) 555-4242.')

print("Area code:", match.group(1))
print("Number:", match.group(2))

Output:

Area code: 415
Number: 555-4242

The Pipe Character (|): Matching Alternatives

The | (pipe) operator matches one of several alternatives:

import re

pattern = re.compile(r'Batman|Tina Fey')

match1 = pattern.search('Batman and Tina Fey.')
print(match1.group()) # Returns the first match found

Output:

Batman

When both alternatives exist in the string, search() returns the first occurrence.

Using Pipe With Groups

import re

# Match 'Bat' followed by one of several options
pattern = re.compile(r'Bat(man|woman|mobile|copter)')

match = pattern.search('Batmobile lost a wheel')
print(match.group()) # Full match: Batmobile
print(match.group(1)) # Group 1: mobile

Output:

Batmobile
mobile

The Question Mark (?): Optional Matching

The ? makes the preceding group optional: it matches zero or one occurrence.

import re

pattern = re.compile(r'Bat(wo)?man')

match1 = pattern.search('The Adventures of Batman')
match2 = pattern.search('The Adventures of Batwoman')

print(match1.group()) # Batman: 'wo' not present (0 occurrences)
print(match2.group()) # Batwoman: 'wo' present (1 occurrence)

Output:

Batman
Batwoman

The (wo)? group matches either zero or one instance of wo.

The Star (*): Zero or More

The * matches zero or more occurrences of the preceding group:

import re

pattern = re.compile(r'Bat(wo)*man')

# Zero occurrences of 'wo'
print(pattern.search('The Adventures of Batman').group())

# One occurrence
print(pattern.search('The Adventures of Batwoman').group())

# Multiple occurrences
print(pattern.search('The Adventures of Batwowowowoman').group())

Output:

Batman
Batwoman
Batwowowowoman

The Plus (+): One or More

The + matches one or more occurrences: the group must appear at least once.

import re

pattern = re.compile(r'Bat(wo)+man')

# One occurrence: matches
print(pattern.search('The Adventures of Batwoman').group())

# Zero occurrences: does NOT match
print(pattern.search('The Adventures of Batman') is None)

Output:

Batwoman
True

Quick Reference: Quantifiers

QuantifierMeaningExampleMatches
?0 or 1colou?rcolor, colour
*0 or more(Ha)*"", Ha, HaHa, ...
+1 or more(Ha)+Ha, HaHa, HaHaHa, ...
{n}Exactly n\d{3}123, 456
{n,m}Between n and m\d{2,4}12, 123, 1234
{n,}n or more\d{2,}12, 123, 12345, ...

Practical Example: Validating Phone Numbers

Combine everything to build a robust phone number validator:

import re

def find_phone_numbers(text):
"""Find all phone numbers in various formats."""
pattern = re.compile(r'''
(\d{3}|\(\d{3}\)) # Area code: 415 or (415)
[\s\-.]? # Optional separator: space, dash, or dot
(\d{3}) # First 3 digits
[\s\-.]? # Optional separator
(\d{4}) # Last 4 digits
''', re.VERBOSE)

matches = pattern.findall(text)
return [f"{area}-{first}-{last}" for area, first, last in matches]


text = """
Call 415-555-4242 or (800) 555.1234.
You can also try 310 555 9876.
"""

numbers = find_phone_numbers(text)
for num in numbers:
print(f"Found: {num}")

Output:

Found: 415-555-4242
Found: (800)-555-1234
Found: 310-555-9876
Use re.VERBOSE for readable patterns

The re.VERBOSE flag lets you add whitespace and comments to your regex, making complex patterns much easier to read and maintain.

Conclusion

Pattern matching with regex in Python provides a powerful, flexible way to search, validate, and extract text.

The key building blocks are character classes (\d, \w, \s) for matching types of characters, quantifiers (?, *, +, {n}) for controlling repetition, groups (()) for capturing parts of a match, and the pipe (|) for matching alternatives. Start with simple patterns and build complexity as needed: regex is one of the most valuable tools in any Python developer's toolkit.