How to Find the MIME Type of a File in Python
MIME types (Multipurpose Internet Mail Extensions) identify how applications should handle specific files. Whether you're building a web server that sets Content-Type headers, validating file uploads for security, or processing email attachments, accurate MIME type detection is essential. Python offers multiple approaches ranging from fast extension-based guessing to robust content-based analysis that examines actual file bytes.
Using the Built-in mimetypes Module
The mimetypes module provides the fastest approach by matching file extensions against a database of known types:
import mimetypes
def get_mime_by_extension(file_path):
"""Guess MIME type based on file extension."""
mime_type, encoding = mimetypes.guess_type(file_path)
return mime_type
# Examples
print(get_mime_by_extension("document.pdf")) # application/pdf
print(get_mime_by_extension("photo.jpg")) # image/jpeg
print(get_mime_by_extension("data.json")) # application/json
print(get_mime_by_extension("archive.tar.gz")) # application/x-tar
print(get_mime_by_extension("unknown.xyz")) # None
You can also find extensions for a given MIME type:
import mimetypes
# Get extension for MIME type
ext = mimetypes.guess_extension("image/png")
print(ext) # .png
# Get all extensions for a type
mimetypes.init()
print(mimetypes.guess_all_extensions("image/jpeg"))
# Output: ['.jpeg', '.jpg', '.jpe']
The mimetypes module only examines file names, not contents. A malicious file renamed from virus.exe to document.pdf would incorrectly return application/pdf. Never rely solely on extensions for security-critical validation.
Robust Detection with python-magic
For security-sensitive applications, python-magic examines the actual file content (magic bytes) to determine the true type:
# Installation (may require system libmagic)
# macOS: brew install libmagic
# Ubuntu: apt-get install libmagic1
pip install python-magic
import magic
def get_true_mime_type(file_path):
"""Detect MIME type by reading file content."""
return magic.from_file(file_path, mime=True)
def get_mime_from_bytes(data):
"""Detect MIME type from bytes data."""
return magic.from_buffer(data, mime=True)
# Detects actual content, not extension
print(get_true_mime_type("renamed_image.txt")) # image/jpeg (if actually a JPEG)
print(get_true_mime_type("document.pdf")) # application/pdf
# Works with raw bytes
with open("photo.jpg", "rb") as f:
content = f.read(1024) # First KB is usually sufficient
print(get_mime_from_bytes(content)) # image/jpeg
Files contain signature bytes at specific offsets that identify their format. For example, PDFs start with %PDF, PNGs start with \x89PNG, and ZIPs start with PK. The python-magic library reads these signatures for reliable detection.
Pure Python Alternative: puremagic
When system dependencies are problematic, puremagic provides content-based detection without requiring libmagic:
pip install puremagic
import puremagic
def get_mime_pure(file_path):
"""Detect MIME type using pure Python implementation."""
matches = puremagic.from_file(file_path)
if matches:
return matches[0].mime_type
return None
def get_mime_from_stream(data):
"""Detect MIME type from bytes."""
matches = puremagic.from_string(data)
return matches[0].mime_type if matches else None
# Usage
print(get_mime_pure("document.docx"))
# application/vnd.openxmlformats-officedocument.wordprocessingml.document
# Get detailed match information
matches = puremagic.from_file("image.png")
for match in matches:
print(f"Type: {match.mime_type}, Name: {match.name}, Confidence: {match.confidence}")
Combining Approaches
For production applications, combine methods for both speed and security:
import mimetypes
from pathlib import Path
def detect_mime_type(file_path, verify_content=False):
"""
Detect MIME type with optional content verification.
Args:
file_path: Path to the file
verify_content: If True, verify using file content (slower but secure)
Returns:
Detected MIME type or None
"""
path = Path(file_path)
# Quick extension-based guess
mime_type, _ = mimetypes.guess_type(str(path))
if verify_content and path.exists():
try:
import magic
actual_type = magic.from_file(str(path), mime=True)
# Warn if extension doesn't match content
if mime_type and actual_type != mime_type:
print(f"Warning: Extension suggests {mime_type}, "
f"but content is {actual_type}")
return actual_type
except ImportError:
pass # Fall back to extension-based detection
return mime_type
# Usage examples
print(detect_mime_type("photo.jpg")) # Fast, extension-based
print(detect_mime_type("upload.pdf", verify_content=True)) # Secure, content-based
File Upload Validation Example
Practical implementation for validating uploaded files:
import magic
from pathlib import Path
ALLOWED_TYPES = {
"image/jpeg",
"image/png",
"image/gif",
"application/pdf"
}
MAX_SIZE = 10 * 1024 * 1024 # 10 MB
def validate_upload(file_path):
"""
Validate an uploaded file for type and size.
Returns:
tuple: (is_valid, mime_type, error_message)
"""
path = Path(file_path)
# Check file exists
if not path.exists():
return False, None, "File not found"
# Check size
size = path.stat().st_size
if size > MAX_SIZE:
return False, None, f"File too large: {size / 1024 / 1024:.1f}MB"
# Verify actual content type
mime_type = magic.from_file(str(path), mime=True)
if mime_type not in ALLOWED_TYPES:
return False, mime_type, f"Type not allowed: {mime_type}"
return True, mime_type, None
# Usage
is_valid, mime_type, error = validate_upload("user_upload.jpg")
if is_valid:
print(f"Valid {mime_type} file")
else:
print(f"Rejected: {error}")
Method Comparison
| Method | Dependency | Speed | Reliability | Use Case |
|---|---|---|---|---|
mimetypes | Built-in | Fastest | Low | UI display, trusted files |
python-magic | System library | Fast | High | Security validation |
puremagic | Pure Python | Good | High | Cross-platform deployment |
- Use
mimetypesfor displaying file icons or processing trusted internal files - Use
python-magicfor validating user uploads or security-critical applications - Use
puremagicwhen you need content detection without system dependencies
By selecting the appropriate detection method based on your security requirements and deployment constraints, you ensure reliable file handling throughout your application.