Skip to main content

How to Download Files via HTTP in Python

The requests library provides a simple and reliable interface for downloading files over HTTP. Whether you are fetching a small JSON response or downloading a multi-gigabyte dataset, choosing the right approach depends on file size, memory constraints, and how robust your error handling needs to be.

In this guide, you will learn how to download files using both simple and streaming methods, add progress bars for user-facing applications, implement retry logic and resumable downloads, and handle authenticated endpoints.

Installation

pip install requests

Simple Download for Small Files

For files that fit comfortably in memory, such as images, documents, or JSON responses, load the entire content at once with response.content:

import requests

url = "https://example.com/logo.png"
response = requests.get(url)

if response.status_code == 200:
with open("logo.png", "wb") as f:
f.write(response.content)
print(f"Download complete ({len(response.content):,} bytes)")
else:
print(f"Failed with status code: {response.status_code}")

Example output:

Download complete (24,576 bytes)

This approach is concise and works well for files under roughly 50 MB. The entire response body is stored in memory as a bytes object before being written to disk.

Streaming Download for Large Files

For large files, streaming prevents loading the entire content into RAM. The stream=True parameter tells requests to download the content in chunks rather than all at once:

import requests

url = "https://example.com/large_dataset.zip"

with requests.get(url, stream=True) as response:
response.raise_for_status()

with open("dataset.zip", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)

print("Download complete")

The iter_content() method yields chunks of the specified size (8,192 bytes in this example), and each chunk is written to disk immediately. Only one chunk is held in memory at any given time.

warning

Without stream=True, the entire file is loaded into memory before you can access it. Downloading a 4 GB file would consume over 4 GB of RAM, potentially crashing your application. Always use streaming for files of unknown or large size.

Adding a Progress Bar

For user-facing applications, displaying download progress provides a much better experience. The tqdm library creates a progress bar using the Content-Length header from the server:

pip install tqdm
import requests
from tqdm import tqdm

url = "https://example.com/video.mp4"

response = requests.get(url, stream=True)
response.raise_for_status()

total_size = int(response.headers.get("content-length", 0))

with open("video.mp4", "wb") as f:
with tqdm(total=total_size, unit="B", unit_scale=True, desc="Downloading") as progress:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
progress.update(len(chunk))

Example output:

Downloading: 100%|██████████| 150M/150M [00:45<00:00, 3.33MB/s]
note

Not all servers include a Content-Length header, especially when using chunked transfer encoding. When the header is missing, total_size will be 0 and tqdm will display progress without a percentage or ETA. The download still works correctly.

Handling Errors and Retries

Production code should handle network failures gracefully. The requests library supports automatic retries through the HTTPAdapter and Retry classes:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def download_with_retry(url, filepath, max_retries=3):
"""Download a file with automatic retry on failure."""
session = requests.Session()

retry_strategy = Retry(
total=max_retries,
backoff_factor=1, # Wait 1s, 2s, 4s between retries
status_forcelist=[429, 500, 502, 503, 504]
)

adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)

try:
with session.get(url, stream=True, timeout=30) as response:
response.raise_for_status()

with open(filepath, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)

print(f"Downloaded: {filepath}")
return True

except requests.exceptions.RequestException as e:
print(f"Download failed: {e}")
return False

success = download_with_retry("https://example.com/data.zip", "data.zip")

The backoff_factor=1 means the library waits 1 second after the first failure, 2 seconds after the second, and 4 seconds after the third. The status_forcelist specifies which HTTP status codes should trigger a retry.

Resumable Downloads

Some servers support resuming interrupted downloads via the HTTP Range header. This is especially valuable for large files over unreliable connections:

import requests
import os

def resume_download(url, filepath):
"""Resume a download from where it was interrupted."""
headers = {}
initial_pos = 0

if os.path.exists(filepath):
initial_pos = os.path.getsize(filepath)
headers["Range"] = f"bytes={initial_pos}-"
print(f"Resuming from byte {initial_pos:,}")

response = requests.get(url, stream=True, headers=headers)

# 206 = Partial Content (resume supported)
# 200 = Full content (server does not support resume, start over)
if response.status_code == 200:
initial_pos = 0

mode = "ab" if initial_pos > 0 else "wb"

with open(filepath, mode) as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)

print("Download complete")

resume_download("https://example.com/large_file.iso", "file.iso")

Example output (resumed):

Resuming from byte 52,428,800
Download complete
tip

Before implementing resume logic, check whether the server supports range requests by looking for Accept-Ranges: bytes in the response headers. Not all servers support this feature.

Downloading with Authentication

Many APIs and private servers require authentication before allowing downloads:

import requests

# Basic authentication (username and password)
response = requests.get(
"https://api.example.com/report.pdf",
auth=("username", "password"),
stream=True
)

# Bearer token authentication
headers = {"Authorization": "Bearer YOUR_ACCESS_TOKEN"}
response = requests.get(
"https://api.example.com/report.pdf",
headers=headers,
stream=True
)

# Download the file if authentication succeeded
if response.status_code == 200:
with open("report.pdf", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print("Download complete")
elif response.status_code == 401:
print("Authentication failed: check your credentials")
else:
print(f"Request failed with status: {response.status_code}")

Using urllib from the Standard Library

If you cannot install third-party packages, Python's built-in urllib module handles basic downloads without any external dependencies:

from urllib.request import urlretrieve

url = "https://example.com/data.csv"
filepath, headers = urlretrieve(url, "data.csv")

print(f"Downloaded to: {filepath}")

For more control over the download process:

from urllib.request import urlopen

url = "https://example.com/data.csv"

with urlopen(url) as response:
with open("data.csv", "wb") as f:
while True:
chunk = response.read(8192)
if not chunk:
break
f.write(chunk)

print("Download complete")

The requests library is preferred for most use cases because it provides a cleaner API, better error handling, and features like automatic retries and session management.

Method Comparison

MethodMemory UsageBest For
response.contentEntire file in RAMSmall files (under 50 MB)
iter_content() with stream=TrueOne chunk at a timeLarge files, limited memory
With tqdm progress barOne chunk + display overheadUser-facing downloads
With retry logicOne chunk at a timeUnreliable networks
urllib.requestDepends on usageNo external dependencies

Conclusion

  • Use response.content for small files where simplicity matters and memory is not a concern.
  • For production scripts handling files of unknown or large size, always use stream=True with iter_content() to keep memory usage constant.
  • Add retry logic with HTTPAdapter and Retry for reliability over unreliable networks, and include a progress bar with tqdm for better user experience in interactive applications.
  • For interrupted downloads of very large files, implement resumable downloads using the Range header when the server supports it.