How to Extract and Filter Digits from a Text File in Python

Data extraction is a fundamental skill in Python scripting. A common scenario involves reading unstructured text from a file and filtering out specific characters, such as numbers meeting a certain condition.

This guide demonstrates how to read a text file, iterate through its content, and extract only digits that are strictly greater than 5.

Prerequisites and Setup

Before writing the script, we need a sample text file to process.

Create a file named String.txt with mixed content (letters, symbols, and numbers):

a1b2c6d7e8f9g0h5

Goal: Extract 6, 7, 8, 9.
Result: 6789.

Step 1: Reading the File Safely

To read a file in Python, the best practice is to use the with open(...) context manager. This ensures the file is automatically closed after operations are finished, preventing resource leaks.

# ✅ Correct: Using 'with open' handles closing automatically
try:
    with open("String.txt", "r") as f:
        content = f.read()
except FileNotFoundError:
    print("Error: String.txt not found.")
    content = ""

note

The mode "r" stands for read-only. If the file does not exist, Python raises a FileNotFoundError.

Step 2: Filtering and Concatenating Logic

Strings in Python are iterable. We can loop through every character, check if it is a digit, and then compare its numerical value.

Logic Breakdown:

char.isdigit(): Returns True if the character is '0' through '9'.
int(char) > 5: Converts the string character to an integer to perform the mathematical comparison.
Concatenation: Appends the valid character to a result string.

content = "a1b8c6"
numbers = ""

for char in content:
    # Check type AND value
    if char.isdigit() and int(char) > 5:
        numbers += char

print(numbers) # Output: 86

Complete Code Solution

Here is the complete script FindDigits.py. It combines file handling with the filtering logic.

def extract_high_digits(filename):
    """
    Reads a file and returns a string of all digits > 5.
    """
    try:
        # 1. Open and read file
        with open(filename, "r") as f:
            string_data = f.read()
            
        # 2. Initialize accumulator
        result_numbers = ""

        # 3. Process each character
        for char in string_data:
            # Check if it is a digit first to avoid ValueError in int()
            if char.isdigit():
                if int(char) > 5:
                    result_numbers += char
        
        return result_numbers

    except FileNotFoundError:
        return "File not found."

if __name__ == "__main__":
    # Execution
    output = extract_high_digits("String.txt")
    print(output)

Alternative: Pythonic List Comprehension

For more advanced users, this entire logic can be condensed into a single readable line using list comprehensions and join.

with open("String.txt", "r") as f:
    text = f.read()

# Create a list of chars where condition is True, then join them
result = "".join([char for char in text if char.isdigit() and int(char) > 5])
print(result)

tip

String concatenation (+=) in a loop can be slow for very large files because strings are immutable (a new string is created every time). The List Comprehension method above is generally more performant for large datasets.

Conclusion

Extracting specific data from text files involves three steps:

File Context: Use with open() to safely access data.
Iteration: Loop through the string character by character.
Validation: Combine string methods (isdigit()) with type casting (int()) to filter data precisely.

Prerequisites and Setup​

Step 1: Reading the File Safely​

Step 2: Filtering and Concatenating Logic​

Complete Code Solution​

Alternative: Pythonic List Comprehension​

Conclusion​

Table of Contents