Skip to main content

Python Data Validation: How to Validate Data Before Processing

Data validation is the practice of ensuring that input data is clean, correct, and useful before it is used in a program. Failing to validate data is the root cause of many runtime errors, security vulnerabilities (like Injection attacks), and corrupted databases.

This guide explores manual validation techniques using standard Python features and introduces modern approaches for robust data handling.

Method 1: Type Checking (isinstance)

Python is dynamically typed, meaning variables can change types. However, certain operations (like math) require specific types. Using isinstance() is the standard way to verify data types.

def process_transaction(amount):
# ✅ Validate that amount is a number (int or float)
if not isinstance(amount, (int, float)):
raise TypeError(f"Expected number, got {type(amount).__name__}")

print(f"Processing transaction of ${amount:.2f}")

# ⛔️ Invalid Input
try:
process_transaction("100") # String input
except TypeError as e:
print(f"Error: {e}")

# ✅ Valid Input
process_transaction(150.50)

Output:

Error: Expected number, got str
Processing transaction of $150.50
note

Avoid using type(x) == int. This breaks inheritance logic. Always use isinstance(x, int).

Method 2: Range and Logic Validation

Data might be the correct type (e.g., an integer) but still be invalid for your business logic (e.g., an age of -5). Use conditional if statements to enforce boundaries.

def register_user(age, username):
# 1. Check Username Length
if len(username) < 3:
print("❌ Username too short.")
return False

# 2. Check Age Range
if not (18 <= age <= 120):
print("❌ Age must be between 18 and 120.")
return False

print("✅ User registered successfully.")
return True

register_user(150, "Al")
register_user(25, "Alice")

Output:

❌ Username too short.
✅ User registered successfully.

Method 3: Pattern Matching with Regex

For strings that follow a specific structure (emails, phone numbers, zip codes), the built-in re module allows you to validate patterns.

import re

def validate_email(email):
# Simple Regex for Email format
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

if re.match(pattern, email):
print(f"✅ '{email}' is valid.")
else:
print(f"⛔️ '{email}' is invalid.")

validate_email("user@example.com")
validate_email("user.com")

Output:

✅ 'user@example.com' is valid.
⛔️ 'user.com' is invalid.

Method 4: Modern Validation with Pydantic

In modern Python development (especially Data Science and Web APIs), writing manual if statements for every field is tedious. Pydantic is a library that enforces type hints at runtime.

Installation: pip install pydantic

from pydantic import BaseModel, ValidationError, field_validator

class User(BaseModel):
id: int
name: str
email: str

# Custom validator for name length
@field_validator('name')
def name_must_be_long_enough(cls, v):
if len(v) < 3:
raise ValueError('Name must be at least 3 characters')
return v

try:
# ⛔️ Invalid Data: ID is a string (will auto-convert if possible), Name is too short
user = User(id="abc", name="Ed", email="ed@test.com")
except ValidationError as e:
print(e.json())

Output (Formatted JSON):

[
{
"type":"int_parsing",
"loc":["id"],
"msg":"Input should be a valid integer, unable to parse string as an integer",
"input":"abc",
"url":"https://errors.pydantic.dev/2.10/v/int_parsing"
},
{
"type":"value_error",
"loc":["name"],
"msg":"Value error, Name must be at least 3 characters",
"input":"Ed",
"ctx":{
"error":"Name must be at least 3 characters"
},
"url":"https://errors.pydantic.dev/2.10/v/value_error"
}
]

Handling Validation Errors

When writing validation logic, adhere to the EAFP principle (Easier to Ask for Forgiveness than Permission) or raising explicit exceptions.

Using Custom Exceptions

Instead of printing errors, raise them so the calling code can decide how to handle them (retry, log, or abort).

class ValidationException(Exception):
"""Custom error for data validation failures."""
pass

def divide(a, b):
if b == 0:
raise ValidationException("Division by zero is not allowed.")
return a / b

try:
result = divide(10, 0)
except ValidationException as e:
print(f"Operation failed: {e}")

Output:

Operation failed: Division by zero is not allowed.

Conclusion

To validate data effectively in Python:

  1. Use isinstance for basic type safety.
  2. Use re (Regex) for string patterns like emails or phone numbers.
  3. Raise Exceptions rather than returning False for critical data failures.
  4. Use Pydantic for complex data structures, APIs, or large configurations to save time and reduce boilerplate code.