How to Get the File Name Without the Extension in Python
A common task in file processing is to extract a file's name from its full path and remove its extension (e.g., turning /path/to/document.txt into document). Python provides robust, cross-platform tools for this in its standard library, with the modern pathlib module being the recommended approach, and the classic os.path module serving as a reliable alternative.
This guide will demonstrate the best ways to achieve this, including the modern pathlib approach, the traditional os.path method, and how to handle special cases like files with multiple extensions (e.g., .tar.gz).
Method 1: Using pathlib.Path.stem (Modern & Recommended)
Introduced in Python 3.4, the pathlib module provides an object-oriented interface for filesystem paths. The Path object has a .stem attribute that directly provides the final path component without its suffix. This is the cleanest and most recommended method.
Solution:
from pathlib import Path
file_path = "/path/to/some/document.txt"
# Create a Path object
p = Path(file_path)
# The .stem attribute gives the file name without the final extension
file_stem = p.stem
print(f"The file stem is: '{file_stem}'")
print(f"The full name is: '{p.name}'")
print(f"The extension is: '{p.suffix}'")
Output:
The file stem is: 'document'
The full name is: 'document.txt'
The extension is: '.txt'
Method 2: Using os.path.splitext() (Classic Method)
The os.path module is the traditional way to manipulate paths in Python. This approach involves two steps: first getting the base filename from the path, and then splitting the extension from the name.
Solution:
import os
file_path = "/path/to/some/document.txt"
# Step 1: Get the base name of the file from the path
base_name = os.path.basename(file_path)
print(f"Base name: {base_name}")
# Step 2: Split the base name into the stem and the extension
# os.path.splitext() returns a tuple: (stem, extension)
file_stem, extension = os.path.splitext(base_name)
print(f"The file stem is: '{file_stem}'")
print(f"The extension is: '{extension}'")
Output:
Base name: document.txt
The file stem is: 'document'
The extension is: '.txt'
Method 3: Using str.split() (Manual & Brittle)
While it's possible to use string manipulation methods like .split(), this approach is not recommended for parsing file paths. It is not cross-platform (it assumes a specific path separator like /) and can fail unexpectedly with file names that contain dots.
Example of the brittle approach:
file_path = "/path/to/some/document.txt"
# This assumes a '/' separator and will fail on Windows
file_name = file_path.split('/')[-1]
# This will fail if the filename contains dots (e.g., 'archive.v1.zip')
file_stem = file_name.split('.')[0]
print(f"The file stem is: '{file_stem}'")
Output:
The file stem is: 'document'
Avoid using str.split() to parse file paths. It is not robust. Use pathlib or os.path instead, as they are designed to handle different operating systems and edge cases correctly.
Handling Multi-Part Extensions (e.g., .tar.gz)
A common challenge is with compressed files that have multi-part extensions, like .tar.gz or .tar.bz2. Both pathlib.Path.stem and os.path.splitext will only remove the final suffix.
Example of the problem:
from pathlib import Path
import os
archive_path = "my_archive.tar.gz"
# pathlib's .stem only removes the last suffix
pathlib_stem = Path(archive_path).stem
print(f"pathlib.Path.stem result: '{pathlib_stem}'")
# os.path.splitext also only removes the last suffix
os_path_stem = os.path.splitext(archive_path)[0]
print(f"os.path.splitext result: '{os_path_stem}'")
Output:
pathlib.Path.stem result: 'my_archive.tar'
os.path.splitext result: 'my_archive.tar'
To get only the base name (my_archive), you must combine one of the robust methods with a manual string split.
Solution: the best approach is to get the full filename using a reliable method (.name from pathlib) and then split it on the first dot.
from pathlib import Path
archive_path = "my_archive.tar.gz"
# Get the full filename safely
full_name = Path(archive_path).name
# Split the full name at the first dot and take the first part
base_name = full_name.split('.', 1)[0]
print(f"The base name is: '{base_name}'")
Output:
The base name is: 'my_archive'
Conclusion
| Method | Best For | Example |
|---|---|---|
pathlib.Path.stem | Modern Python (3.4+). Cleanest, most readable, and recommended approach. | Path(my_path).stem |
os.path.splitext() | Legacy code or projects needing backward compatibility with Python < 3.4. | os.path.splitext(os.path.basename(my_path))[0] |
| Hybrid Approach | Handling multi-part extensions like .tar.gz. | Path(my_path).name.split('.', 1)[0] |
For new projects, you should always prefer the pathlib module for its clear, object-oriented API. For special cases like multi-part extensions, combine it with str.split() for a robust solution.