How to Search and Replace Text in a File in Python
Searching and replacing text within files is a common task in Python: whether you're updating configuration files, processing log data, cleaning datasets, or performing batch modifications across multiple documents. Python provides several approaches, from simple string replacement to powerful regex-based pattern matching.
In this guide, you'll learn four methods to search and replace text in files, understand when to use each one, and handle edge cases like large files and pattern-based replacements.
Sample File
All examples below use a file called sample.txt with the following content:
Hello World! This is a dummy text file.
It contains some dummy data for testing.
The word dummy appears multiple times.
Method 1: Using open() and replace() (Simplest Approach)
The most straightforward method reads the entire file into memory, performs the replacement, and writes the result back:
search_text = "dummy"
replace_text = "sample"
# Read the file
with open('sample.txt', 'r') as file:
content = file.read()
# Replace the text
content = content.replace(search_text, replace_text)
# Write the modified content back
with open('sample.txt', 'w') as file:
file.write(content)
print("Text replaced successfully")
Output:
Text replaced successfully
File content after replacement:
Hello World! This is a sample text file.
It contains some sample data for testing.
The word sample appears multiple times.
How It Works
open('sample.txt', 'r')opens the file in read mode.file.read()loads the entire file content into a string.content.replace(search_text, replace_text)replaces all occurrences.open('sample.txt', 'w')opens the file in write mode (overwrites existing content).file.write(content)saves the modified text.
This method loads the entire file into memory. For very large files (hundreds of MB or more), this can consume significant RAM. Use the fileinput method (Method 4) for large files instead.
Method 2: Using pathlib (Modern, Clean Syntax)
Python's built-in pathlib module (available since Python 3.4) provides an object-oriented interface for file operations. It makes the code cleaner and more readable:
from pathlib import Path
def replace_in_file(filepath, search_text, replace_text):
file = Path(filepath)
content = file.read_text()
content = content.replace(search_text, replace_text)
file.write_text(content)
return "Text replaced successfully"
result = replace_in_file('sample.txt', 'dummy', 'sample')
print(result)
Output:
Text replaced successfully
pathlib is part of Python's standard library: no installation needed. It replaces the older pathlib2 package. Use read_text() and write_text() for simple file operations without explicit open()/close() management.
Specifying File Encoding
When working with files that contain special characters, always specify the encoding:
from pathlib import Path
file = Path('sample.txt')
content = file.read_text(encoding='utf-8')
content = content.replace('dummy', 'sample')
file.write_text(content, encoding='utf-8')
Method 3: Using the re Module (Regex Pattern Matching)
The re module enables pattern-based search and replace, which is far more powerful than simple string replacement. Use this method when you need to match variations, patterns, or complex text structures.
Basic Regex Replacement
import re
search_pattern = "dummy"
replace_text = "sample"
with open('sample.txt', 'r') as file:
content = file.read()
# Replace all occurrences matching the pattern
content = re.sub(search_pattern, replace_text, content)
with open('sample.txt', 'w') as file:
file.write(content)
print("Text replaced successfully")
Case-Insensitive Replacement
Replace text regardless of capitalization:
import re
with open('sample.txt', 'r') as file:
content = file.read()
# Matches "dummy", "Dummy", "DUMMY", etc.
content = re.sub(r'dummy', 'sample', content, flags=re.IGNORECASE)
with open('sample.txt', 'w') as file:
file.write(content)
Replacing Patterns (Not Just Fixed Text)
Regex shines when you need to match patterns rather than exact strings:
import re
with open('sample.txt', 'r') as file:
content = file.read()
# Replace all email addresses with [REDACTED]
content = re.sub(r'\b[\w.+-]+@[\w-]+\.[\w.]+\b', '[REDACTED]', content)
# Replace all dates in MM/DD/YYYY format with [DATE]
content = re.sub(r'\d{2}/\d{2}/\d{4}', '[DATE]', content)
# Replace multiple spaces with a single space
content = re.sub(r' +', ' ', content)
with open('sample.txt', 'w') as file:
file.write(content)
print("Patterns replaced successfully")
Using r+ Mode (Read and Write in One Open)
You can read and write in a single open() call using r+ mode:
import re
with open('sample.txt', 'r+') as file:
content = file.read()
content = re.sub(r'dummy', 'sample', content)
file.seek(0) # Move cursor to the beginning
file.write(content)
file.truncate() # Remove any leftover content
print("Text replaced successfully")
seek(0) and truncate()?file.seek(0)moves the file cursor back to the beginning before writing.file.truncate()removes any remaining old content. Without it, if the replacement text is shorter than the original, leftover characters from the old content will remain at the end of the file.
Method 4: Using fileinput (Line-by-Line, In-Place Editing)
The fileinput module processes files line by line, making it memory-efficient for large files. It also supports in-place editing with automatic backup creation:
from fileinput import FileInput
search_text = "dummy"
replace_text = "sample"
with FileInput('sample.txt', inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(search_text, replace_text), end='')
print("Text replaced successfully")
How It Works
| Parameter | Description |
|---|---|
inplace=True | Redirects print() output to the file itself instead of the console |
backup='.bak' | Creates a backup copy (sample.txt.bak) before modifying the original |
end='' | Prevents print() from adding extra newlines (lines already contain \n) |
The fileinput method is the best choice for large files because it processes one line at a time instead of loading the entire file into memory. The automatic backup feature also provides a safety net.
Replacing Text in Multiple Files
To perform search and replace across multiple files, combine any method with glob:
import glob
from pathlib import Path
search_text = "dummy"
replace_text = "sample"
files = glob.glob('data/*.txt')
for filepath in files:
file = Path(filepath)
content = file.read_text(encoding='utf-8')
if search_text in content:
content = content.replace(search_text, replace_text)
file.write_text(content, encoding='utf-8')
print(f"Updated: {filepath}")
else:
print(f"No match: {filepath}")
Common Mistake: Not Using truncate() with r+ Mode
When using r+ mode and the replacement text is shorter than the original, leftover characters remain:
# ❌ Without truncate(): file may have leftover content
with open('sample.txt', 'r+') as file:
content = file.read()
content = content.replace("replacement", "new") # "new" is shorter
file.seek(0)
file.write(content)
# Missing file.truncate(): old characters remain at the end!
Fix: Always call file.truncate() after writing:
# ✅ With truncate(): file is clean
with open('sample.txt', 'r+') as file:
content = file.read()
content = content.replace("replacement", "new")
file.seek(0)
file.write(content)
file.truncate() # Remove any leftover content
Comparison of Methods
| Method | Memory Usage | Pattern Matching | Backup Support | Best For |
|---|---|---|---|---|
open() + replace() | Loads entire file | ❌ (exact match only) | ❌ (manual) | Simple replacements in small files |
pathlib | Loads entire file | ❌ (exact match only) | ❌ (manual) | Clean, modern code |
re.sub() | Loads entire file | ✅ (regex patterns) | ❌ (manual) | Complex pattern matching |
fileinput | Line by line | ❌ (exact match only) | ✅ (automatic) | Large files, batch processing |
Summary
To search and replace text in a file in Python:
- Use
open()+replace()for the simplest approach with small files and exact text matches. - Use
pathlibfor cleaner, more modern syntax without sacrificing functionality. - Use
re.sub()when you need pattern matching: case-insensitive replacement, regex patterns, or complex text structures. - Use
fileinputfor large files that shouldn't be loaded entirely into memory, or when you want automatic backup creation.
Always handle file encoding explicitly with encoding='utf-8', use truncate() when writing with r+ mode, and consider creating backups before modifying important files.