Skip to main content

How to Find the Largest File in a Directory Using Python

When managing files on your system - whether for cleanup, disk usage analysis, or monitoring storage - it's often useful to identify the largest file within a directory and all its subdirectories. Python's built-in os module makes this straightforward without requiring any third-party packages.

This guide demonstrates multiple approaches to find the largest file in a directory, from a simple script to more robust and reusable solutions.

Understanding the Key Functions

Before diving into the code, let's understand the core os module functions used:

  • os.walk(path) - recursively traverses a directory tree, yielding a tuple of (folder_path, subdirectories, files) for each directory.
  • os.stat(file_path) - returns file status information, including the file size via the st_size attribute (in bytes).
  • os.path.join() - constructs full file paths by joining directory and file names in a platform-independent way.

Basic Approach Using os.walk()

This method walks through the entire directory tree, checks the size of every file, and keeps track of the largest one found:

import os

def find_largest_file(directory):
max_size = 0
max_file = ""

for folder, subfolders, files in os.walk(directory):
for file in files:
file_path = os.path.join(folder, file)
try:
size = os.stat(file_path).st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
# Skip files that can't be accessed
continue

return max_file, max_size

# Example usage
directory = "/Users/username/Downloads"
largest_file, size = find_largest_file(directory)

print(f"The largest file is: {largest_file}")
print(f"Size: {size} bytes")

Output:

The largest file is: /Users/username/Downloads/project/video_recording.mp4
Size: 64061656 bytes

How It Works

  1. os.walk(directory) recursively yields every folder, its subdirectories, and its files.
  2. For each file, os.stat() retrieves its size in bytes via the st_size attribute.
  3. The code compares each file's size against the current maximum and updates accordingly.
  4. A try/except block handles files that may be inaccessible due to permissions or broken symbolic links.
Always handle file access errors

Without error handling, your script will crash if it encounters a file it can't read (e.g., system files, broken symlinks, or permission-restricted files):

# This will crash on inaccessible files
for folder, subfolders, files in os.walk(directory):
for file in files:
file_path = os.path.join(folder, file)
size = os.stat(file_path).st_size # May raise PermissionError or OSError

Always wrap file operations in a try/except block to ensure robustness.

Using pathlib for a Modern Approach

Python's pathlib module (available since Python 3.4) offers a more object-oriented and readable way to work with file paths:

from pathlib import Path

def find_largest_file(directory):
max_size = 0
max_file = None

for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
continue

return max_file, max_size

directory = "/Users/username/Documents"
largest_file, size = find_largest_file(directory)

print(f"The largest file is: {largest_file}")
print(f"Size: {size:,} bytes")

Output:

The largest file is: /Users/username/Documents/backups/database_dump.sql
Size: 1,792,316 bytes
  • Path(directory).rglob("*") recursively matches all files and directories (similar to os.walk() but more concise).
  • .is_file() filters out directories, so only actual files are measured.
  • .stat().st_size returns the file size, just like os.stat().
tip

You can filter for specific file types by changing the glob pattern. For example, to find the largest .log file:

for file_path in Path(directory).rglob("*.log"):

Displaying Human-Readable File Sizes

File sizes in bytes aren't always easy to interpret. Here's a helper function that converts bytes into a human-readable format:

from pathlib import Path

def format_size(size_bytes):
"""Convert bytes to a human-readable string."""
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.2f} PB"

def find_largest_file(directory):
max_size = 0
max_file = None

for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
if size > max_size:
max_size = size
max_file = file_path
except (OSError, PermissionError):
continue

return max_file, max_size

directory = "/Users/username/Downloads"
largest_file, size = find_largest_file(directory)

if largest_file:
print(f"The largest file is: {largest_file}")
print(f"Size: {format_size(size)}")
else:
print("No files found in the directory.")

Output:

The largest file is: /Users/username/Downloads/project/video_recording.mp4
Size: 61.10 MB

Finding the Top N Largest Files

Sometimes you need more than just the single largest file. Here's how to find the top N largest files using a sorted approach:

from pathlib import Path

def format_size(size_bytes):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.2f} PB"

def find_top_n_largest(directory, n=5):
files_with_sizes = []

for file_path in Path(directory).rglob("*"):
if file_path.is_file():
try:
size = file_path.stat().st_size
files_with_sizes.append((file_path, size))
except (OSError, PermissionError):
continue

# Sort by size in descending order and return top N
files_with_sizes.sort(key=lambda x: x[1], reverse=True)
return files_with_sizes[:n]

directory = "/Users/username/Downloads"
top_files = find_top_n_largest(directory, n=5)

print(f"Top {len(top_files)} largest files:\n")
for i, (file_path, size) in enumerate(top_files, 1):
print(f" {i}. {file_path}")
print(f" Size: {format_size(size)}\n")

Output:

Top 5 largest files:

1. /Users/username/Downloads/video_recording.mp4
Size: 61.10 MB

2. /Users/username/Downloads/dataset.csv
Size: 24.35 MB

3. /Users/username/Downloads/presentation.pptx
Size: 12.78 MB

4. /Users/username/Downloads/archive.zip
Size: 8.45 MB

5. /Users/username/Downloads/report.pdf
Size: 3.21 MB

Comparison of Approaches

ApproachBest ForPython VersionReadability
os.walk() + os.stat()Compatibility with older Python2.7+⭐⭐⭐
pathlib.Path.rglob()Modern, clean code3.4+⭐⭐⭐⭐⭐
Top N with sortingFinding multiple large files3.4+⭐⭐⭐⭐

Conclusion

Finding the largest file in a directory is a practical task that Python handles efficiently with its built-in modules.

  • For modern Python projects, the pathlib approach is recommended for its readability and clean API.
  • The traditional os.walk() method remains a solid choice when working with older Python versions or when you need maximum compatibility.

Regardless of the approach, always include error handling to gracefully skip inaccessible files and ensure your script runs reliably across different environments.