How to Use Pattern Matching With Regex in Python
Regular expressions (regex) are a powerful tool for searching, matching, and manipulating text based on defined patterns. Python's built-in re module provides full regex support, allowing you to validate formats (like phone numbers and emails), extract data from strings, and perform complex text transformations.
In this guide, you will learn the fundamentals of pattern matching with regex in Python: from basic character matching to grouping, quantifiers, and optional patterns, with practical examples and clear explanations.
Getting Started: The Basics
To use regex in Python, import the re module, compile a pattern, and search for matches:
import re
# Step 1: Compile a regex pattern
pattern = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
# Step 2: Search for the pattern in a string
match = pattern.search('My number is 415-555-4242.')
# Step 3: Extract the matched text
if match:
print('Phone number found:', match.group())
Output:
Phone number found: 415-555-4242
Key concepts:
re.compile(r'pattern')creates a reusable regex object. Therprefix creates a raw string, preventing Python from interpreting backslashes..search()scans the string for the first match and returns aMatchobject (orNoneif no match)..group()returns the actual matched text.
Without the r prefix, Python interprets \d as an escape sequence. Raw strings (r'\d') pass the backslash directly to the regex engine:
# INCORRECT: without raw string, Python interprets \d
pattern = re.compile('\d\d\d') # Works but may cause issues with other escapes
# CORRECT: with raw string, backslashes are preserved
pattern = re.compile(r'\d\d\d')
Common Regex Symbols
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
\d | Any digit (0-9) | \d\d\d | 415, 123 |
\w | Any word character (letter, digit, underscore) | \w+ | hello_123 |
\s | Any whitespace (space, tab, newline) | \s+ | spaces, tabs |
. | Any character except newline | a.b | a1b, a-b |
^ | Start of string | ^Hello | Hello world |
$ | End of string | world$ | Hello world |
\D | Any non-digit | \D+ | abc, xyz |
\W | Any non-word character | \W | @, #, - |
Using Curly Braces for Repetition
Instead of writing \d\d\d, use curly braces {n} to specify how many times a pattern should repeat:
import re
# Both patterns are equivalent
pattern1 = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
pattern2 = re.compile(r'\d{3}-\d{3}-\d{4}')
text = 'Call me at 415-555-4242.'
print(pattern2.search(text).group())
Output:
415-555-4242
Range Repetition
Curly braces also support ranges:
import re
# Match 'Ha' repeated 3 times
pattern = re.compile(r'(Ha){3}')
print(pattern.search('HaHaHa').group()) # HaHaHa
print(pattern.search('Ha') is None) # True: only 1 repeat
Output:
HaHaHa
True
| Syntax | Meaning |
|---|---|
{3} | Exactly 3 repetitions |
{3,5} | Between 3 and 5 repetitions |
{3,} | 3 or more repetitions |
{,5} | 0 to 5 repetitions |
Grouping With Parentheses
Parentheses () create groups that let you extract specific parts of a match:
import re
pattern = re.compile(r'(\d{3})-(\d{3}-\d{4})')
match = pattern.search('My number is 415-555-4242.')
# Access individual groups
print("Full match:", match.group()) # Entire match
print("Group 1:", match.group(1)) # Area code
print("Group 2:", match.group(2)) # Number
# Unpack all groups at once
area_code, number = match.groups()
print(f"Area code: {area_code}, Number: {number}")
Output:
Full match: 415-555-4242
Group 1: 415
Group 2: 555-4242
Area code: 415, Number: 555-4242
Matching Literal Parentheses
Since parentheses have special meaning in regex, escape them with \ to match actual parentheses in text:
import re
pattern = re.compile(r'\((\d{3})\) (\d{3}-\d{4})')
match = pattern.search('My phone number is (415) 555-4242.')
print("Area code:", match.group(1))
print("Number:", match.group(2))
Output:
Area code: 415
Number: 555-4242
The Pipe Character (|): Matching Alternatives
The | (pipe) operator matches one of several alternatives:
import re
pattern = re.compile(r'Batman|Tina Fey')
match1 = pattern.search('Batman and Tina Fey.')
print(match1.group()) # Returns the first match found
Output:
Batman
When both alternatives exist in the string, search() returns the first occurrence.
Using Pipe With Groups
import re
# Match 'Bat' followed by one of several options
pattern = re.compile(r'Bat(man|woman|mobile|copter)')
match = pattern.search('Batmobile lost a wheel')
print(match.group()) # Full match: Batmobile
print(match.group(1)) # Group 1: mobile
Output:
Batmobile
mobile
The Question Mark (?): Optional Matching
The ? makes the preceding group optional: it matches zero or one occurrence.
import re
pattern = re.compile(r'Bat(wo)?man')
match1 = pattern.search('The Adventures of Batman')
match2 = pattern.search('The Adventures of Batwoman')
print(match1.group()) # Batman: 'wo' not present (0 occurrences)
print(match2.group()) # Batwoman: 'wo' present (1 occurrence)
Output:
Batman
Batwoman
The (wo)? group matches either zero or one instance of wo.
The Star (*): Zero or More
The * matches zero or more occurrences of the preceding group:
import re
pattern = re.compile(r'Bat(wo)*man')
# Zero occurrences of 'wo'
print(pattern.search('The Adventures of Batman').group())
# One occurrence
print(pattern.search('The Adventures of Batwoman').group())
# Multiple occurrences
print(pattern.search('The Adventures of Batwowowowoman').group())
Output:
Batman
Batwoman
Batwowowowoman
The Plus (+): One or More
The + matches one or more occurrences: the group must appear at least once.
import re
pattern = re.compile(r'Bat(wo)+man')
# One occurrence: matches
print(pattern.search('The Adventures of Batwoman').group())
# Zero occurrences: does NOT match
print(pattern.search('The Adventures of Batman') is None)
Output:
Batwoman
True
Quick Reference: Quantifiers
| Quantifier | Meaning | Example | Matches |
|---|---|---|---|
? | 0 or 1 | colou?r | color, colour |
* | 0 or more | (Ha)* | "", Ha, HaHa, ... |
+ | 1 or more | (Ha)+ | Ha, HaHa, HaHaHa, ... |
{n} | Exactly n | \d{3} | 123, 456 |
{n,m} | Between n and m | \d{2,4} | 12, 123, 1234 |
{n,} | n or more | \d{2,} | 12, 123, 12345, ... |
Practical Example: Validating Phone Numbers
Combine everything to build a robust phone number validator:
import re
def find_phone_numbers(text):
"""Find all phone numbers in various formats."""
pattern = re.compile(r'''
(\d{3}|\(\d{3}\)) # Area code: 415 or (415)
[\s\-.]? # Optional separator: space, dash, or dot
(\d{3}) # First 3 digits
[\s\-.]? # Optional separator
(\d{4}) # Last 4 digits
''', re.VERBOSE)
matches = pattern.findall(text)
return [f"{area}-{first}-{last}" for area, first, last in matches]
text = """
Call 415-555-4242 or (800) 555.1234.
You can also try 310 555 9876.
"""
numbers = find_phone_numbers(text)
for num in numbers:
print(f"Found: {num}")
Output:
Found: 415-555-4242
Found: (800)-555-1234
Found: 310-555-9876
re.VERBOSE for readable patternsThe re.VERBOSE flag lets you add whitespace and comments to your regex, making complex patterns much easier to read and maintain.
Conclusion
Pattern matching with regex in Python provides a powerful, flexible way to search, validate, and extract text.
The key building blocks are character classes (\d, \w, \s) for matching types of characters, quantifiers (?, *, +, {n}) for controlling repetition, groups (()) for capturing parts of a match, and the pipe (|) for matching alternatives. Start with simple patterns and build complexity as needed: regex is one of the most valuable tools in any Python developer's toolkit.