How to Download Large Files in Python with Requests

Downloading large files, such as datasets, ISO images, or media files, requires careful handling in Python. Simply reading the entire file into memory before saving it can crash your program or consume excessive resources. The requests library supports streaming downloads that process data in small chunks, keeping memory usage low regardless of file size.

In this guide, you'll learn multiple methods to download large files efficiently in Python, including progress bars, resume support, and error handling.

Basic Streaming Download with Chunks

The most important technique for downloading large files is using stream=True with iter_content(). This reads the file in small chunks instead of loading it entirely into memory:

import requests

def download_file(url, destination):
    """Download a large file in chunks to avoid memory issues."""
    with requests.get(url, stream=True) as response:
        response.raise_for_status()

        with open(destination, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

    print(f"Downloaded: {destination}")


# Usage
url = 'https://example.com/largefile.zip'
download_file(url, 'largefile.zip')

How it works:

stream=True tells requests not to download the entire response body immediately.
iter_content(chunk_size=8192) yields the response data in 8 KB chunks.
Each chunk is written to disk immediately and then discarded from memory.

Choosing chunk_size

8192 bytes (8 KB): Good default for most situations.
65536 bytes (64 KB) or 1048576 bytes (1 MB): Better for fast connections and large files, fewer write operations.
Larger chunks mean fewer I/O operations but slightly higher peak memory usage.

Adding a Progress Bar

For large downloads, a progress bar provides essential feedback. Use the tqdm library:

pip install tqdm

import requests
from tqdm import tqdm

def download_with_progress(url, destination):
    """Download a file with a progress bar."""
    response = requests.get(url, stream=True)
    response.raise_for_status()

    # Get total file size from headers
    total_size = int(response.headers.get('content-length', 0))
    chunk_size = 1024 * 1024  # 1 MB chunks

    with open(destination, 'wb') as f, tqdm(
        desc=destination,
        total=total_size,
        unit='B',
        unit_scale=True,
        unit_divisor=1024,
    ) as progress_bar:
        for chunk in response.iter_content(chunk_size=chunk_size):
            size = f.write(chunk)
            progress_bar.update(size)

    print(f"\nDownload complete: {destination}")


# Usage
url = 'https://example.com/largefile.zip'
download_with_progress(url, 'largefile.zip')

Output:

largefile.zip:  45%|████████▌          | 450M/1.00G [00:30<00:37, 15.0MB/s]

The progress bar shows the filename, percentage complete, downloaded/total size, elapsed time, estimated time remaining, and download speed.

Resumable Downloads

If a download is interrupted (network failure, timeout), you don't want to start over. Range requests let you resume from where you left off:

import requests
import os

def download_resumable(url, destination):
    """Download a file with resume support."""
    # Check if a partial download exists
    existing_size = 0
    if os.path.exists(destination):
        existing_size = os.path.getsize(destination)

    # Set the Range header to resume from where we left off
    headers = {}
    if existing_size > 0:
        headers['Range'] = f'bytes={existing_size}-'
        print(f"Resuming download from {existing_size / (1024*1024):.1f} MB")

    response = requests.get(url, headers=headers, stream=True)

    # 206 = Partial Content (resume successful)
    # 200 = OK (server doesn't support resume, starting over)
    if response.status_code == 200:
        existing_size = 0  # Server doesn't support resume
        mode = 'wb'
    elif response.status_code == 206:
        mode = 'ab'  # Append to existing file
    else:
        response.raise_for_status()
        return

    total_size = int(response.headers.get('content-length', 0)) + existing_size

    with open(destination, mode) as f:
        downloaded = existing_size
        for chunk in response.iter_content(chunk_size=1024 * 1024):
            f.write(chunk)
            downloaded += len(chunk)
            progress = (downloaded / total_size * 100) if total_size > 0 else 0
            print(f"\rProgress: {progress:.1f}% ({downloaded / (1024*1024):.1f} MB)", end='')

    print(f"\nDownload complete: {destination}")


# Usage
download_resumable('https://example.com/largefile.zip', 'largefile.zip')

note

Not all servers support range requests. The function handles this by checking the response status code. For example, 206 means resume is supported, 200 means the server is sending the full file from the beginning.

Using `shutil.copyfileobj()` for Simplicity

For a simpler approach, shutil.copyfileobj() handles the chunked reading and writing automatically:

import requests
import shutil

def download_with_shutil(url, destination):
    """Download using shutil for automatic chunked copying."""
    response = requests.get(url, stream=True)
    response.raise_for_status()

    with open(destination, 'wb') as f:
        shutil.copyfileobj(response.raw, f)

    print(f"Downloaded: {destination}")


# Usage
download_with_shutil('https://example.com/largefile.zip', 'largefile.zip')

Limitation

When using response.raw with shutil.copyfileobj(), the data may not be automatically decompressed if the server uses gzip or deflate encoding. For most binary file downloads this isn't an issue, but for compressed text responses, prefer iter_content().

Comprehensive Download Function

Here's a production-ready function that combines all best practices:

import requests
import os
from tqdm import tqdm

def download_large_file(url, destination, chunk_size=1024 * 1024, resume=True):
    """
    Download a large file with progress bar and resume support.

    Args:
        url: URL of the file to download.
        destination: Local path to save the file.
        chunk_size: Size of each download chunk in bytes (default: 1 MB).
        resume: Whether to resume partial downloads (default: True).
    """
    headers = {}
    existing_size = 0

    # Resume support
    if resume and os.path.exists(destination):
        existing_size = os.path.getsize(destination)
        headers['Range'] = f'bytes={existing_size}-'

    try:
        response = requests.get(url, headers=headers, stream=True, timeout=30)

        if response.status_code == 416:
            print("File already fully downloaded.")
            return

        response.raise_for_status()

        # Determine write mode and total size
        if response.status_code == 206:
            mode = 'ab'
            total = int(response.headers.get('content-length', 0)) + existing_size
        else:
            mode = 'wb'
            existing_size = 0
            total = int(response.headers.get('content-length', 0))

        # Download with progress bar
        with open(destination, mode) as f, tqdm(
            desc=os.path.basename(destination),
            initial=existing_size,
            total=total,
            unit='B',
            unit_scale=True,
            unit_divisor=1024,
        ) as bar:
            for chunk in response.iter_content(chunk_size=chunk_size):
                size = f.write(chunk)
                bar.update(size)

        print(f"Download complete: {destination}")

    except requests.exceptions.ConnectionError:
        print(f"Connection lost. Run again to resume from {os.path.getsize(destination)} bytes.")
    except requests.exceptions.Timeout:
        print("Download timed out. Run again to resume.")
    except requests.exceptions.RequestException as e:
        print(f"Download failed: {e}")


# Usage
download_large_file(
    url='https://example.com/largefile.zip',
    destination='largefile.zip'
)

Verifying Downloaded Files

After downloading, verify the file integrity using checksums:

import hashlib

def verify_checksum(filepath, expected_hash, algorithm='sha256'):
    """Verify a file's checksum after download."""
    hasher = hashlib.new(algorithm)

    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            hasher.update(chunk)

    actual_hash = hasher.hexdigest()
    is_valid = actual_hash == expected_hash

    if is_valid:
        print(f"✅ Checksum verified: {actual_hash[:16]}...")
    else:
        print(f"❌ Checksum mismatch!")
        print(f"   Expected: {expected_hash[:16]}...")
        print(f"   Got:      {actual_hash[:16]}...")

    return is_valid


# Usage after download
verify_checksum('largefile.zip', 'expected_sha256_hash_here')

Quick Comparison of Methods

Method	Memory Efficient	Progress	Resume	Simplicity
`iter_content()` with chunks	✅	❌ (manual)	❌ (manual)	✅ Simple
`iter_content()` + `tqdm`	✅	✅	❌ (manual)	✅ Good
`iter_content()` + Range header	✅	✅ (manual)	✅	🔶 Moderate
`shutil.copyfileobj()`	✅	❌	❌	✅ Simplest
Comprehensive function	✅	✅	✅	🔶 Moderate

Conclusion

Downloading large files in Python requires streaming to avoid loading the entire file into memory. Here's which approach to use:

For simple downloads, use requests.get(url, stream=True) with iter_content(). It's clean, efficient, and handles files of any size.
For user-facing downloads, add a tqdm progress bar so users can see download progress and estimated time remaining.
For unreliable networks, implement resume support using HTTP Range headers to avoid restarting failed downloads from scratch.
For maximum simplicity, use shutil.copyfileobj(response.raw, file). One line handles all the chunking automatically.

Always use stream=True, set an appropriate chunk_size, include error handling with try/except, and verify downloads with checksums when integrity matters.

Basic Streaming Download with Chunks​

Adding a Progress Bar​

Resumable Downloads​

Using shutil.copyfileobj() for Simplicity​

Comprehensive Download Function​

Verifying Downloaded Files​

Quick Comparison of Methods​

Conclusion​

Table of Contents