How to Find the Largest File in a Directory Using Python
When managing files on your system - whether for cleanup, disk usage analysis, or monitoring storage - it's often useful to identify the largest file within a directory and all its subdirectories. Python's built-in os module makes this straightforward without requiring any third-party packages.
This guide demonstrates multiple approaches to find the largest file in a directory, from a simple script to more robust and reusable solutions.
Understanding the Key Functions
Before diving into the code, let's understand the core os module functions used:
os.walk(path)- recursively traverses a directory tree, yielding a tuple of(folder_path, subdirectories, files)for each directory.os.stat(file_path)- returns file status information, including the file size via thest_sizeattribute (in bytes).os.path.join()- constructs full file paths by joining directory and file names in a platform-independent way.
Basic Approach Using os.walk()
This method walks through the entire directory tree, checks the size of every file, and keeps track of the largest one found:
import os
def find_largest_file(directory):
max_size = 0
max_file = ""
for folder, subfolders, files in os.walk(directory):
for file in files:
file_path = os.path.join(folder, file)
try:
size = os.stat(file_path).st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
# Skip files that can't be accessed
continue
return max_file, max_size
# Example usage
directory = "/Users/username/Downloads"
largest_file, size = find_largest_file(directory)
print(f"The largest file is: {largest_file}")
print(f"Size: {size} bytes")
Output:
The largest file is: /Users/username/Downloads/project/video_recording.mp4
Size: 64061656 bytes
How It Works
os.walk(directory)recursively yields every folder, its subdirectories, and its files.- For each file,
os.stat()retrieves its size in bytes via thest_sizeattribute. - The code compares each file's size against the current maximum and updates accordingly.
- A
try/exceptblock handles files that may be inaccessible due to permissions or broken symbolic links.
Without error handling, your script will crash if it encounters a file it can't read (e.g., system files, broken symlinks, or permission-restricted files):
# This will crash on inaccessible files
for folder, subfolders, files in os.walk(directory):
for file in files:
file_path = os.path.join(folder, file)
size = os.stat(file_path).st_size # May raise PermissionError or OSError
Always wrap file operations in a try/except block to ensure robustness.
Using pathlib for a Modern Approach
Python's pathlib module (available since Python 3.4) offers a more object-oriented and readable way to work with file paths:
from pathlib import Path
def find_largest_file(directory):
max_size = 0
max_file = None
for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
continue
return max_file, max_size
directory = "/Users/username/Documents"
largest_file, size = find_largest_file(directory)
print(f"The largest file is: {largest_file}")
print(f"Size: {size:,} bytes")
Output:
The largest file is: /Users/username/Documents/backups/database_dump.sql
Size: 1,792,316 bytes
Path(directory).rglob("*")recursively matches all files and directories (similar toos.walk()but more concise)..is_file()filters out directories, so only actual files are measured..stat().st_sizereturns the file size, just likeos.stat().
You can filter for specific file types by changing the glob pattern. For example, to find the largest .log file:
for file_path in Path(directory).rglob("*.log"):
Displaying Human-Readable File Sizes
File sizes in bytes aren't always easy to interpret. Here's a helper function that converts bytes into a human-readable format:
from pathlib import Path
def format_size(size_bytes):
"""Convert bytes to a human-readable string."""
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.2f} PB"
def find_largest_file(directory):
max_size = 0
max_file = None
for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
continue
return max_file, max_size
directory = "/Users/username/Downloads"
largest_file, size = find_largest_file(directory)
if largest_file:
print(f"The largest file is: {largest_file}")
print(f"Size: {format_size(size)}")
else:
print("No files found in the directory.")
Output:
The largest file is: /Users/username/Downloads/project/video_recording.mp4
Size: 61.10 MB
Finding the Top N Largest Files
Sometimes you need more than just the single largest file. Here's how to find the top N largest files using a sorted approach:
from pathlib import Path
def format_size(size_bytes):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.2f} PB"
def find_top_n_largest(directory, n=5):
files_with_sizes = []
for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
files_with_sizes.append((file_path, size))
except (OSError, PermissionError):
continue
# Sort by size in descending order and return top N
files_with_sizes.sort(key=lambda x: x[1], reverse=True)
return files_with_sizes[:n]
directory = "/Users/username/Downloads"
top_files = find_top_n_largest(directory, n=5)
print(f"Top {len(top_files)} largest files:\n")
for i, (file_path, size) in enumerate(top_files, 1):
print(f" {i}. {file_path}")
print(f" Size: {format_size(size)}\n")
Output:
Top 5 largest files:
1. /Users/username/Downloads/video_recording.mp4
Size: 61.10 MB
2. /Users/username/Downloads/dataset.csv
Size: 24.35 MB
3. /Users/username/Downloads/presentation.pptx
Size: 12.78 MB
4. /Users/username/Downloads/archive.zip
Size: 8.45 MB
5. /Users/username/Downloads/report.pdf
Size: 3.21 MB
Comparison of Approaches
| Approach | Best For | Python Version | Readability |
|---|---|---|---|
os.walk() + os.stat() | Compatibility with older Python | 2.7+ | ⭐⭐⭐ |
pathlib.Path.rglob() | Modern, clean code | 3.4+ | ⭐⭐⭐⭐⭐ |
| Top N with sorting | Finding multiple large files | 3.4+ | ⭐⭐⭐⭐ |
Conclusion
Finding the largest file in a directory is a practical task that Python handles efficiently with its built-in modules.
- For modern Python projects, the
pathlibapproach is recommended for its readability and clean API. - The traditional
os.walk()method remains a solid choice when working with older Python versions or when you need maximum compatibility.
Regardless of the approach, always include error handling to gracefully skip inaccessible files and ensure your script runs reliably across different environments.