Skip to main content

How to Validate File and Folder Names in Python

Validating file and folder names is a critical step in building robust applications. A name that works perfectly on Linux might crash your program on Windows due to reserved characters or keywords. Furthermore, allowing unchecked filenames can lead to security vulnerabilities (like path traversal) or data loss.

This guide explores how to validate filenames using Python's standard libraries, covering cross-platform rules, regular expressions, and sanitization techniques.

Understanding Naming Constraints

Different operating systems impose different restrictions. To write cross-platform Python code, you should generally adhere to the strictest common set of rules (usually Windows).

  • Forbidden Characters: < > : " / \ | ? *
  • Reserved Names (Windows): CON, PRN, AUX, NUL, COM1-COM9, LPT1-LPT9.
  • Length: Generally limited to 255 characters.
  • Whitespace: Leading or trailing spaces are often stripped by the OS or cause confusion.

Method 1: Validation Using Regular Expressions

The most efficient way to check for forbidden characters is using the re module. You can define a pattern of invalid characters and reject the filename if a match is found.

import re

def validate_filename_regex(filename):
# 1. Define invalid characters (Windows standard + Unix slashes)
# Characters: / \ : * ? " < > |
invalid_chars_pattern = r'[<>:"/\\|?*]'

# 2. Check for invalid characters
if re.search(invalid_chars_pattern, filename):
print(f"Invalid: '{filename}' contains forbidden characters.")
return False

# 3. Check for control characters (unprintable)
# \0 is the null byte, common in exploit attempts
if re.search(r'[\0]', filename):
print(f"Invalid: '{filename}' contains null bytes.")
return False

print(f"Valid: '{filename}' structure is okay.")
return True

# ✅ Correct Usage
validate_filename_regex("report_2023.txt")

# ⛔️ Error Case
validate_filename_regex("image/new.png")
validate_filename_regex("data?.json")

Output:

Valid: 'report_2023.txt' structure is okay.
Invalid: 'image/new.png' contains forbidden characters.
Invalid: 'data?.json' contains forbidden characters.
tip

Always check for the Null Byte (\0). While rarely typed by users, it is a common vector for security exploits in file handling.

Method 2: Checking Reserved Keywords

On Windows, creating a file named CON.txt or nul.json can cause system-level errors because these names map to hardware devices. A robust validator must explicitly check against this list.

def validate_reserved_names(filename):
# List of reserved names (case-insensitive)
reserved_names = {
"CON", "PRN", "AUX", "NUL",
"COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
"LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
}

# Extract the name without extension for checking
# e.g., "CON.txt" -> "CON"
base_name = filename.split('.')[0].upper()

if base_name in reserved_names:
return False

return True

# ✅ Correct: Valid Name
print(f"Is 'data.txt' valid? {validate_reserved_names('data.txt')}")

# ⛔️ Incorrect: Reserved Name
print(f"Is 'CON.txt' valid? {validate_reserved_names('CON.txt')}")

Output:

Is 'data.txt' valid? True
Is 'CON.txt' valid? False

Method 3: Sanitizing Filenames (Auto-Fixing)

Sometimes, instead of rejecting user input, you want to "fix" it. This involves replacing invalid characters with a safe substitute (like an underscore).

import re

def sanitize_filename(filename):
# 1. Define invalid characters
invalid_chars_pattern = r'[<>:"/\\|?*]'

# 2. Replace invalid characters with underscore
clean_name = re.sub(invalid_chars_pattern, '_', filename)

# 3. Strip leading/trailing spaces and dots
clean_name = clean_name.strip(" .")

# 4. Handle Empty resulting string
if not clean_name:
clean_name = "unnamed_file"

return clean_name

raw_input = "User:Input/File?.txt"

# ✅ Correct: Sanitizing the input
safe_name = sanitize_filename(raw_input)

print(f"Original: {raw_input}")
print(f"Sanitized: {safe_name}")

Output:

Original: User:Input/File?.txt
Sanitized: User_Input_File_.txt
note

If the sanitized name overwrites an existing file (e.g., "file?.txt" and "file!.txt" both become "file_.txt"), you may need logic to append a counter (e.g., "file_(1).txt") to prevent data loss.

Conclusion

To handle file and folder names securely in Python:

  1. Use Regex to block characters that are illegal on the target filesystem (< > : " / \ | ? *).
  2. Block Reserved Words like CON and NUL to ensure Windows compatibility.
  3. Sanitize Inputs using re.sub if you want to accept user input and convert it to a safe format automatically.
  4. Use pathlib for managing paths, but rely on string validation for the filename logic itself.