Python Data Validation: How to Validate Data Before Processing
Data validation is the practice of ensuring that input data is clean, correct, and useful before it is used in a program. Failing to validate data is the root cause of many runtime errors, security vulnerabilities (like Injection attacks), and corrupted databases.
This guide explores manual validation techniques using standard Python features and introduces modern approaches for robust data handling.
Method 1: Type Checking (isinstance)
Python is dynamically typed, meaning variables can change types. However, certain operations (like math) require specific types. Using isinstance() is the standard way to verify data types.
def process_transaction(amount):
# ✅ Validate that amount is a number (int or float)
if not isinstance(amount, (int, float)):
raise TypeError(f"Expected number, got {type(amount).__name__}")
print(f"Processing transaction of ${amount:.2f}")
# ⛔️ Invalid Input
try:
process_transaction("100") # String input
except TypeError as e:
print(f"Error: {e}")
# ✅ Valid Input
process_transaction(150.50)
Output:
Error: Expected number, got str
Processing transaction of $150.50
Avoid using type(x) == int. This breaks inheritance logic. Always use isinstance(x, int).
Method 2: Range and Logic Validation
Data might be the correct type (e.g., an integer) but still be invalid for your business logic (e.g., an age of -5). Use conditional if statements to enforce boundaries.
def register_user(age, username):
# 1. Check Username Length
if len(username) < 3:
print("❌ Username too short.")
return False
# 2. Check Age Range
if not (18 <= age <= 120):
print("❌ Age must be between 18 and 120.")
return False
print("✅ User registered successfully.")
return True
register_user(150, "Al")
register_user(25, "Alice")
Output:
❌ Username too short.
✅ User registered successfully.
Method 3: Pattern Matching with Regex
For strings that follow a specific structure (emails, phone numbers, zip codes), the built-in re module allows you to validate patterns.
import re
def validate_email(email):
# Simple Regex for Email format
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if re.match(pattern, email):
print(f"✅ '{email}' is valid.")
else:
print(f"⛔️ '{email}' is invalid.")
validate_email("user@example.com")
validate_email("user.com")
Output:
✅ 'user@example.com' is valid.
⛔️ 'user.com' is invalid.
Method 4: Modern Validation with Pydantic
In modern Python development (especially Data Science and Web APIs), writing manual if statements for every field is tedious. Pydantic is a library that enforces type hints at runtime.
Installation: pip install pydantic
from pydantic import BaseModel, ValidationError, field_validator
class User(BaseModel):
id: int
name: str
email: str
# Custom validator for name length
@field_validator('name')
def name_must_be_long_enough(cls, v):
if len(v) < 3:
raise ValueError('Name must be at least 3 characters')
return v
try:
# ⛔️ Invalid Data: ID is a string (will auto-convert if possible), Name is too short
user = User(id="abc", name="Ed", email="ed@test.com")
except ValidationError as e:
print(e.json())
Output (Formatted JSON):
[
{
"type":"int_parsing",
"loc":["id"],
"msg":"Input should be a valid integer, unable to parse string as an integer",
"input":"abc",
"url":"https://errors.pydantic.dev/2.10/v/int_parsing"
},
{
"type":"value_error",
"loc":["name"],
"msg":"Value error, Name must be at least 3 characters",
"input":"Ed",
"ctx":{
"error":"Name must be at least 3 characters"
},
"url":"https://errors.pydantic.dev/2.10/v/value_error"
}
]
Handling Validation Errors
When writing validation logic, adhere to the EAFP principle (Easier to Ask for Forgiveness than Permission) or raising explicit exceptions.
Using Custom Exceptions
Instead of printing errors, raise them so the calling code can decide how to handle them (retry, log, or abort).
class ValidationException(Exception):
"""Custom error for data validation failures."""
pass
def divide(a, b):
if b == 0:
raise ValidationException("Division by zero is not allowed.")
return a / b
try:
result = divide(10, 0)
except ValidationException as e:
print(f"Operation failed: {e}")
Output:
Operation failed: Division by zero is not allowed.
Conclusion
To validate data effectively in Python:
- Use
isinstancefor basic type safety. - Use
re(Regex) for string patterns like emails or phone numbers. - Raise Exceptions rather than returning
Falsefor critical data failures. - Use Pydantic for complex data structures, APIs, or large configurations to save time and reduce boilerplate code.