How to Extract and Filter Digits from a Text File in Python
Data extraction is a fundamental skill in Python scripting. A common scenario involves reading unstructured text from a file and filtering out specific characters, such as numbers meeting a certain condition.
This guide demonstrates how to read a text file, iterate through its content, and extract only digits that are strictly greater than 5.
Prerequisites and Setup
Before writing the script, we need a sample text file to process.
Create a file named String.txt with mixed content (letters, symbols, and numbers):
a1b2c6d7e8f9g0h5
- Goal: Extract
6,7,8,9. - Result:
6789.
Step 1: Reading the File Safely
To read a file in Python, the best practice is to use the with open(...) context manager. This ensures the file is automatically closed after operations are finished, preventing resource leaks.
# ✅ Correct: Using 'with open' handles closing automatically
try:
with open("String.txt", "r") as f:
content = f.read()
except FileNotFoundError:
print("Error: String.txt not found.")
content = ""
The mode "r" stands for read-only. If the file does not exist, Python raises a FileNotFoundError.
Step 2: Filtering and Concatenating Logic
Strings in Python are iterable. We can loop through every character, check if it is a digit, and then compare its numerical value.
Logic Breakdown:
char.isdigit(): ReturnsTrueif the character is'0'through'9'.int(char) > 5: Converts the string character to an integer to perform the mathematical comparison.- Concatenation: Appends the valid character to a result string.
content = "a1b8c6"
numbers = ""
for char in content:
# Check type AND value
if char.isdigit() and int(char) > 5:
numbers += char
print(numbers) # Output: 86
Complete Code Solution
Here is the complete script FindDigits.py. It combines file handling with the filtering logic.
def extract_high_digits(filename):
"""
Reads a file and returns a string of all digits > 5.
"""
try:
# 1. Open and read file
with open(filename, "r") as f:
string_data = f.read()
# 2. Initialize accumulator
result_numbers = ""
# 3. Process each character
for char in string_data:
# Check if it is a digit first to avoid ValueError in int()
if char.isdigit():
if int(char) > 5:
result_numbers += char
return result_numbers
except FileNotFoundError:
return "File not found."
if __name__ == "__main__":
# Execution
output = extract_high_digits("String.txt")
print(output)
Alternative: Pythonic List Comprehension
For more advanced users, this entire logic can be condensed into a single readable line using list comprehensions and join.
with open("String.txt", "r") as f:
text = f.read()
# Create a list of chars where condition is True, then join them
result = "".join([char for char in text if char.isdigit() and int(char) > 5])
print(result)
String concatenation (+=) in a loop can be slow for very large files because strings are immutable (a new string is created every time). The List Comprehension method above is generally more performant for large datasets.
Conclusion
Extracting specific data from text files involves three steps:
- File Context: Use
with open()to safely access data. - Iteration: Loop through the string character by character.
- Validation: Combine string methods (
isdigit()) with type casting (int()) to filter data precisely.