How to Download Large Files in Python with Requests
Downloading large files, such as datasets, ISO images, or media files, requires careful handling in Python. Simply reading the entire file into memory before saving it can crash your program or consume excessive resources. The requests library supports streaming downloads that process data in small chunks, keeping memory usage low regardless of file size.
In this guide, you'll learn multiple methods to download large files efficiently in Python, including progress bars, resume support, and error handling.
Basic Streaming Download with Chunks
The most important technique for downloading large files is using stream=True with iter_content(). This reads the file in small chunks instead of loading it entirely into memory:
import requests
def download_file(url, destination):
"""Download a large file in chunks to avoid memory issues."""
with requests.get(url, stream=True) as response:
response.raise_for_status()
with open(destination, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Downloaded: {destination}")
# Usage
url = 'https://example.com/largefile.zip'
download_file(url, 'largefile.zip')
How it works:
stream=Truetellsrequestsnot to download the entire response body immediately.iter_content(chunk_size=8192)yields the response data in 8 KB chunks.- Each chunk is written to disk immediately and then discarded from memory.
- 8192 bytes (8 KB): Good default for most situations.
- 65536 bytes (64 KB) or 1048576 bytes (1 MB): Better for fast connections and large files, fewer write operations.
- Larger chunks mean fewer I/O operations but slightly higher peak memory usage.
Adding a Progress Bar
For large downloads, a progress bar provides essential feedback. Use the tqdm library:
pip install tqdm
import requests
from tqdm import tqdm
def download_with_progress(url, destination):
"""Download a file with a progress bar."""
response = requests.get(url, stream=True)
response.raise_for_status()
# Get total file size from headers
total_size = int(response.headers.get('content-length', 0))
chunk_size = 1024 * 1024 # 1 MB chunks
with open(destination, 'wb') as f, tqdm(
desc=destination,
total=total_size,
unit='B',
unit_scale=True,
unit_divisor=1024,
) as progress_bar:
for chunk in response.iter_content(chunk_size=chunk_size):
size = f.write(chunk)
progress_bar.update(size)
print(f"\nDownload complete: {destination}")
# Usage
url = 'https://example.com/largefile.zip'
download_with_progress(url, 'largefile.zip')
Output:
largefile.zip: 45%|████████▌ | 450M/1.00G [00:30<00:37, 15.0MB/s]
The progress bar shows the filename, percentage complete, downloaded/total size, elapsed time, estimated time remaining, and download speed.
Resumable Downloads
If a download is interrupted (network failure, timeout), you don't want to start over. Range requests let you resume from where you left off:
import requests
import os
def download_resumable(url, destination):
"""Download a file with resume support."""
# Check if a partial download exists
existing_size = 0
if os.path.exists(destination):
existing_size = os.path.getsize(destination)
# Set the Range header to resume from where we left off
headers = {}
if existing_size > 0:
headers['Range'] = f'bytes={existing_size}-'
print(f"Resuming download from {existing_size / (1024*1024):.1f} MB")
response = requests.get(url, headers=headers, stream=True)
# 206 = Partial Content (resume successful)
# 200 = OK (server doesn't support resume, starting over)
if response.status_code == 200:
existing_size = 0 # Server doesn't support resume
mode = 'wb'
elif response.status_code == 206:
mode = 'ab' # Append to existing file
else:
response.raise_for_status()
return
total_size = int(response.headers.get('content-length', 0)) + existing_size
with open(destination, mode) as f:
downloaded = existing_size
for chunk in response.iter_content(chunk_size=1024 * 1024):
f.write(chunk)
downloaded += len(chunk)
progress = (downloaded / total_size * 100) if total_size > 0 else 0
print(f"\rProgress: {progress:.1f}% ({downloaded / (1024*1024):.1f} MB)", end='')
print(f"\nDownload complete: {destination}")
# Usage
download_resumable('https://example.com/largefile.zip', 'largefile.zip')
Not all servers support range requests. The function handles this by checking the response status code. For example, 206 means resume is supported, 200 means the server is sending the full file from the beginning.
Using shutil.copyfileobj() for Simplicity
For a simpler approach, shutil.copyfileobj() handles the chunked reading and writing automatically:
import requests
import shutil
def download_with_shutil(url, destination):
"""Download using shutil for automatic chunked copying."""
response = requests.get(url, stream=True)
response.raise_for_status()
with open(destination, 'wb') as f:
shutil.copyfileobj(response.raw, f)
print(f"Downloaded: {destination}")
# Usage
download_with_shutil('https://example.com/largefile.zip', 'largefile.zip')
When using response.raw with shutil.copyfileobj(), the data may not be automatically decompressed if the server uses gzip or deflate encoding. For most binary file downloads this isn't an issue, but for compressed text responses, prefer iter_content().
Comprehensive Download Function
Here's a production-ready function that combines all best practices:
import requests
import os
from tqdm import tqdm
def download_large_file(url, destination, chunk_size=1024 * 1024, resume=True):
"""
Download a large file with progress bar and resume support.
Args:
url: URL of the file to download.
destination: Local path to save the file.
chunk_size: Size of each download chunk in bytes (default: 1 MB).
resume: Whether to resume partial downloads (default: True).
"""
headers = {}
existing_size = 0
# Resume support
if resume and os.path.exists(destination):
existing_size = os.path.getsize(destination)
headers['Range'] = f'bytes={existing_size}-'
try:
response = requests.get(url, headers=headers, stream=True, timeout=30)
if response.status_code == 416:
print("File already fully downloaded.")
return
response.raise_for_status()
# Determine write mode and total size
if response.status_code == 206:
mode = 'ab'
total = int(response.headers.get('content-length', 0)) + existing_size
else:
mode = 'wb'
existing_size = 0
total = int(response.headers.get('content-length', 0))
# Download with progress bar
with open(destination, mode) as f, tqdm(
desc=os.path.basename(destination),
initial=existing_size,
total=total,
unit='B',
unit_scale=True,
unit_divisor=1024,
) as bar:
for chunk in response.iter_content(chunk_size=chunk_size):
size = f.write(chunk)
bar.update(size)
print(f"Download complete: {destination}")
except requests.exceptions.ConnectionError:
print(f"Connection lost. Run again to resume from {os.path.getsize(destination)} bytes.")
except requests.exceptions.Timeout:
print("Download timed out. Run again to resume.")
except requests.exceptions.RequestException as e:
print(f"Download failed: {e}")
# Usage
download_large_file(
url='https://example.com/largefile.zip',
destination='largefile.zip'
)
Verifying Downloaded Files
After downloading, verify the file integrity using checksums:
import hashlib
def verify_checksum(filepath, expected_hash, algorithm='sha256'):
"""Verify a file's checksum after download."""
hasher = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
hasher.update(chunk)
actual_hash = hasher.hexdigest()
is_valid = actual_hash == expected_hash
if is_valid:
print(f"✅ Checksum verified: {actual_hash[:16]}...")
else:
print(f"❌ Checksum mismatch!")
print(f" Expected: {expected_hash[:16]}...")
print(f" Got: {actual_hash[:16]}...")
return is_valid
# Usage after download
verify_checksum('largefile.zip', 'expected_sha256_hash_here')
Quick Comparison of Methods
| Method | Memory Efficient | Progress | Resume | Simplicity |
|---|---|---|---|---|
iter_content() with chunks | ✅ | ❌ (manual) | ❌ (manual) | ✅ Simple |
iter_content() + tqdm | ✅ | ✅ | ❌ (manual) | ✅ Good |
iter_content() + Range header | ✅ | ✅ (manual) | ✅ | 🔶 Moderate |
shutil.copyfileobj() | ✅ | ❌ | ❌ | ✅ Simplest |
| Comprehensive function | ✅ | ✅ | ✅ | 🔶 Moderate |
Conclusion
Downloading large files in Python requires streaming to avoid loading the entire file into memory. Here's which approach to use:
- For simple downloads, use
requests.get(url, stream=True)withiter_content(). It's clean, efficient, and handles files of any size. - For user-facing downloads, add a tqdm progress bar so users can see download progress and estimated time remaining.
- For unreliable networks, implement resume support using HTTP Range headers to avoid restarting failed downloads from scratch.
- For maximum simplicity, use
shutil.copyfileobj(response.raw, file). One line handles all the chunking automatically.
Always use stream=True, set an appropriate chunk_size, include error handling with try/except, and verify downloads with checksums when integrity matters.