How to Validate String Input Safely in Python
User input is the primary attack vector for many security vulnerabilities, including SQL Injection and Cross-Site Scripting (XSS). Even in non-web applications, unvalidated strings can lead to data corruption or runtime crashes. Safely handling string input involves a combination of validation (checking if data is correct) and sanitization (cleaning data to make it safe).
This guide explores how to validate string length, enforce character patterns using Regular Expressions, and sanitize inputs to prevent injection attacks.
Basic Validation (Length and Type)
Before performing complex checks, ensure the input meets basic constraints. The most fundamental checks are ensuring the string is not empty and falls within a safe length range to prevent buffer overflow issues in downstream systems.
def validate_basic(text, min_len=1, max_len=100):
# 1. Strip whitespace to prevent empty inputs like " "
clean_text = text.strip()
if not clean_text:
return False, "Input cannot be empty."
if len(clean_text) < min_len or len(clean_text) > max_len:
return False, f"Input must be between {min_len} and {max_len} characters."
return True, "Valid"
# ✅ Test Valid Input
is_valid, msg = validate_basic("Hello World")
print(f"Status: {msg}")
# ⛔️ Test Invalid Input
is_valid, msg = validate_basic(" ")
print(f"Status: {msg}")
Output:
Status: Valid
Status: Input cannot be empty.
Method 1: Whitelisting Characters with Regex
Whitelisting is safer than blacklisting. Instead of trying to block every dangerous character (like <, >, ;, '), define exactly what is allowed (e.g., alphanumeric characters only) and reject everything else.
We use the re.fullmatch() function to ensure the entire string adheres to the pattern.
import re
def validate_username(username):
# Pattern: ^[a-zA-Z0-9_]+$
# Allows only letters, numbers, and underscores
pattern = r"^[a-zA-Z0-9_]+$"
if re.fullmatch(pattern, username):
print(f"✅ '{username}' is a valid username.")
else:
print(f"⛔️ '{username}' contains invalid characters.")
validate_username("admin_01")
validate_username("admin; DROP TABLE users;") # SQL Injection attempt
Output:
✅ 'admin_01' is a valid username.
⛔️ 'admin; DROP TABLE users;' contains invalid characters.
Always prefer re.fullmatch() over re.match() for validation. re.match() only checks the start of the string, which might allow malicious code to be appended at the end.
Method 2: Sanitization (Removing Dangerous Characters)
Sometimes you cannot reject input; you must clean it. Sanitization involves removing or encoding characters that could be interpreted as code.
Removing HTML/Script Tags
If accepting text that might be displayed in a browser, use html.escape() to convert special characters into safe HTML entities.
import html
raw_input = "<script>alert('Hacked')</script>"
# ✅ Encode special characters
safe_input = html.escape(raw_input)
print(f"Original: {raw_input}")
print(f"Sanitized: {safe_input}")
Output:
Original: <script>alert('Hacked')</script>
Sanitized: <script>alert('Hacked')</script>
Stripping Special Characters via Regex
You can also use regex to aggressively strip out anything that isn't alphanumeric.
import re
def sanitize_filename(filename):
# Remove anything that isn't a letter, number, dot, or dash
return re.sub(r'[^a-zA-Z0-9.-]', '', filename)
unsafe_file = "../../etc/passwd"
safe_file = sanitize_filename(unsafe_file)
print(f"Safe Filename: {safe_file}")
Output:
Safe Filename: ....etcpasswd
Method 3: The "Retry Loop" Pattern
In interactive CLI applications, validation should be combined with a loop that prompts the user until valid input is received.
def get_safe_integer(prompt):
while True:
user_input = input(prompt).strip()
# 1. Validate: Is it digits?
if not user_input.isdigit():
print("Error: Please enter a valid number (digits only).")
continue
# 2. Convert and Range Check
value = int(user_input)
if 18 <= value <= 120:
return value
else:
print("Error: Age must be between 18 and 120.")
age = get_safe_integer("Enter your age: ")
print(f"Accepted Age: {age}")
Example of simulated output: user enters "abc", then "10", then "25"
Enter your age: abc
Error: Please enter a valid number (digits only).
Enter your age: 10
Error: Age must be between 18 and 120.
Enter your age: 25
Accepted Age: 25
Never trust input directly in SQL queries. Even if you validate strings in Python, always use parameterized queries (e.g., cursor.execute("SELECT * FROM users WHERE name = ?", (user_input,))) provided by your database driver to prevent SQL injection.
Conclusion
To validate string input safely:
- Whitelist, don't blacklist: Define exactly what characters are allowed (using Regex) rather than trying to guess what is malicious.
- Sanitize output: Use
html.escape()if the string will be rendered in a web context. - Validate boundaries: Always check
len()to prevent empty strings or massive payloads. - Use Loops: In CLI apps, force the user to retry until the input meets your criteria.