Skip to main content

How to Validate File Naming Rules in Python

Validating file names is a critical step in Python applications that generate files or accept user uploads. Operating systems (Windows, Linux, macOS) have different rules regarding forbidden characters, length limits, and reserved keywords. Failing to validate these names can lead to OSError, data overwrites, or security vulnerabilities like directory traversal attacks.

This guide covers how to validate file names using standard Regular Expressions and the robust third-party library pathvalidate.

Understanding OS Naming Constraints

Before writing code, it is essential to know what makes a filename "invalid."

  • Windows: Prohibits < > : " / \ | ? *. Also reserves names like CON, PRN, AUX, NUL, COM1-COM9, LPT1-LPT9.
  • Linux/macOS: Generally allows almost anything except the forward slash / (path separator) and the null byte \0.
  • Length: Most file systems limit filenames to 255 bytes.
warning

Security Risk: Never trust user input directly. A filename like ../../etc/passwd is valid syntactically but is a Directory Traversal attack intended to access sensitive files outside the intended directory.

Method 1: Using Regular Expressions (Custom Rules)

If you need to enforce a specific naming convention (e.g., "only alphanumeric characters and underscores"), Regular Expressions (regex) are the standard tool.

Allow-list Approach (Whitelist)

This approach is safer than trying to ban specific characters. We define exactly what is allowed.

import re

def is_valid_filename(filename):
# ✅ Correct: Allow only alphanumeric, underscores, hyphens, and dots
# The regex ensures the name contains at least one valid char
pattern = r'^[a-zA-Z0-9_\-\.]+$'

if re.match(pattern, filename) and ".." not in filename:
return True
return False

# Test Cases
filenames = ["report_2023.txt", "image/png", "user?name", "../hack"]

for fname in filenames:
if is_valid_filename(fname):
print(f"✅ '{fname}' is valid.")
else:
print(f"⛔️ '{fname}' is invalid.")

Output:

✅ 'report_2023.txt' is valid.
⛔️ 'image/png' is invalid.
⛔️ 'user?name' is invalid.
⛔️ '../hack' is invalid.

Sanitizing Filenames

Instead of rejecting the input, you might want to remove invalid characters.

import re

def sanitize_filename_regex(filename):
# Replace anything that is NOT alphanumeric, _, -, or . with an underscore
cleaned = re.sub(r'[^a-zA-Z0-9_\-\.]', '_', filename)
return cleaned

raw_name = "My<Cool>File:v1.txt"
clean_name = sanitize_filename_regex(raw_name)

print(f"Original: {raw_name}")
print(f"Cleaned: {clean_name}")

Output:

Original: My<Cool>File:v1.txt
Cleaned: My_Cool_File_v1.txt

Handling every edge case (like Windows reserved names or cross-platform rules) manually is difficult. The pathvalidate library is the industry standard for this task.

Installation:

pip install pathvalidate

Validating and Sanitizing

from pathvalidate import is_valid_filename, sanitize_filename

filename_bad = "config:v1/final?.json"
filename_good = "config_v1_final.json"

# ✅ Check Validity
print(f"Is '{filename_bad}' valid? {is_valid_filename(filename_bad)}")
print(f"Is '{filename_good}' valid? {is_valid_filename(filename_good)}")

# ✅ Sanitize (Auto-replace bad chars)
sanitized = sanitize_filename(filename_bad)
print(f"Sanitized: {sanitized}")

Output:

Is 'config:v1/final?.json' valid? False
Is 'config_v1_final.json' valid? True
Sanitized: configv1final.json
note

pathvalidate automatically handles OS-specific rules, such as removing trailing spaces or dots which are problematic on Windows.

Method 3: Checking for Reserved Filenames

If you are not using pathvalidate and plan to support Windows users, you must explicitly check for reserved device names. Windows will not allow files named CON, PRN, etc., even with an extension (e.g., CON.txt is invalid).

import os

def is_reserved_name(filename):
# Windows reserved names
reserved_names = {
"CON", "PRN", "AUX", "NUL",
"COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
"LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
}

# Extract the base name without extension and normalize case
base_name = os.path.splitext(filename)[0].upper()

return base_name in reserved_names

# Test
files = ["data.txt", "con.txt", "aux"]

for f in files:
if is_reserved_name(f):
print(f"⛔️ '{f}' is a reserved system name.")
else:
print(f"✅ '{f}' is okay to use.")

Output:

✅ 'data.txt' is okay to use.
⛔️ 'con.txt' is a reserved system name.
⛔️ 'aux' is a reserved system name.

Conclusion

To ensure your file names are valid and safe:

  1. Use pathvalidate for production applications. It covers cross-platform edge cases that manual regex often misses.
  2. Use Whitelist Regex (r'^[a-zA-Z0-9_\-\.]+$') if you need to enforce a strict naming convention within your app.
  3. Sanitize, don't just validate. It is usually better to auto-correct a bad filename (by replacing bad characters) than to crash the program.
  4. Block Directory Traversal. Always ensure filenames do not contain .. or path separators (/, \) if they are supposed to be simple filenames.