Skip to main content

How to Download Files from URLs in Python

Downloading files from the internet is one of the most common tasks in Python, whether you're building a web scraper, automating data pipelines, pulling datasets for machine learning, or simply fetching documents programmatically.

Python offers several approaches to download files from URLs, ranging from built-in standard library modules to powerful third-party packages. This guide covers the most popular methods with complete examples, error handling, and best practices to help you choose the right tool for your use case.

Method 1: Using the requests Library

The requests library is the most popular and Pythonic way to make HTTP requests. It provides a clean API, excellent error handling, and support for streaming large files.

Installation

pip install requests

Basic File Download

import requests

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

response = requests.get(url)

if response.status_code == 200:
with open(output_file, "wb") as file:
file.write(response.content)
print(f"File downloaded successfully: {output_file}")
else:
print(f"Download failed. Status code: {response.status_code}")

Output:

File downloaded successfully: downloaded_file.pdf

Downloading Large Files with Streaming

For large files, loading the entire response into memory is inefficient. Use stream=True to download in chunks:

import requests

url = "https://example.com/large-dataset.zip"
output_file = "dataset.zip"

response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()

with open(output_file, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)

print(f"File downloaded successfully: {output_file}")
  • stream=True: prevents the entire file from being loaded into memory at once.
  • iter_content(chunk_size=8192): reads the response in 8 KB chunks.
  • raise_for_status(): raises an HTTPError for 4xx/5xx responses.
tip

Always use streaming for files larger than a few megabytes. This keeps memory usage constant regardless of file size.

Adding a Progress Indicator

When downloading large files, it's helpful to show download progress:

import requests

url = "https://example.com/large-file.zip"
output_file = "large_file.zip"

response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()

total_size = int(response.headers.get("content-length", 0))
downloaded = 0

with open(output_file, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
downloaded += len(chunk)
if total_size > 0:
percent = (downloaded / total_size) * 100
print(f"\rDownloading: {percent:.1f}%", end="")

print(f"\nDownload complete: {output_file}")

Sample output:

Downloading: 100.0%
Download complete: large_file.zip

Method 2: Using urllib.request (Standard Library)

If you want to avoid third-party dependencies, Python's built-in urllib.request module can handle file downloads without any installation.

Using urlretrieve()

The simplest approach is one line to download and save:

import urllib.request

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

filepath, headers = urllib.request.urlretrieve(url, output_file)

print(f"File saved to: {filepath}")
print(f"Content-Type: {headers['Content-Type']}")

Output:

File saved to: downloaded_file.pdf
Content-Type: application/pdf

Using urlopen() for More Control

For greater control over the download process, use urlopen() and write data manually:

import urllib.request

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

with urllib.request.urlopen(url, timeout=30) as response:
with open(output_file, "wb") as file:
# Read and write in chunks
while True:
chunk = response.read(8192)
if not chunk:
break
file.write(chunk)

print(f"File downloaded: {output_file}")
caution

urllib.request provides less intuitive error handling compared to requests. HTTP errors must be caught with urllib.error.HTTPError, and there's no built-in equivalent to raise_for_status().

import urllib.request
import urllib.error

try:
urllib.request.urlretrieve("https://example.com/nonexistent.pdf", "file.pdf")
except urllib.error.HTTPError as e:
print(f"HTTP Error: {e.code} - {e.reason}")
except urllib.error.URLError as e:
print(f"URL Error: {e.reason}")

Method 3: Using the wget Module

The wget module provides the simplest possible interface, mimicking the behavior of the Linux wget command in a single function call.

Installation

pip install wget

Basic Usage

import wget

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

downloaded_path = wget.download(url, output_file)

print(f"\nFile saved to: {downloaded_path}")

Output:

100% [..........................................................................] 12345 / 12345
File saved to: downloaded_file.pdf

wget.download() automatically displays a progress bar in the terminal, making it convenient for interactive scripts.

Automatic Filename Detection

If you omit the output filename, wget extracts it from the URL:

import wget

url = "https://example.com/report-2025.pdf"
downloaded_path = wget.download(url)

print(f"\nSaved as: {downloaded_path}")

Output:

100% [..........................................................................] 54321 / 54321
Saved as: report-2025.pdf
note

The wget module is lightweight and great for quick scripts, but it offers limited configuration compared to requests (no custom headers, authentication, or session management).

Adding Robust Error Handling

Production code should handle network failures, invalid URLs, timeouts, and HTTP errors gracefully. Here's a reusable download function using requests:

import requests
from pathlib import Path

def download_file(url, output_path, timeout=30, chunk_size=8192):
"""Download a file from a URL with error handling."""
try:
response = requests.get(url, stream=True, timeout=timeout)
response.raise_for_status()

output = Path(output_path)
output.parent.mkdir(parents=True, exist_ok=True)

with open(output, "wb") as file:
for chunk in response.iter_content(chunk_size=chunk_size):
file.write(chunk)

print(f"Downloaded: {output} ({output.stat().st_size:,} bytes)")
return True

except requests.exceptions.MissingSchema:
print(f"Invalid URL (missing http/https): {url}")
except requests.exceptions.ConnectionError:
print(f"Connection failed: {url}")
except requests.exceptions.Timeout:
print(f"Request timed out after {timeout}s: {url}")
except requests.exceptions.HTTPError as e:
print(f"HTTP error {e.response.status_code}: {url}")

return False


# Usage
download_file("https://example.com/data.csv", "downloads/data.csv")
download_file("https://invalid-domain.xyz/file.txt", "fail.txt")

Output:

Downloaded: downloads/data.csv (1,234 bytes)
Connection failed: https://invalid-domain.xyz/file.txt

Common Mistake: Not Checking the Response Before Saving

A frequent error is writing the response to disk without verifying the HTTP status code. If the server returns a 404 or 500 error, you'll end up saving an HTML error page instead of the actual file:

import requests

# ❌ Wrong: saves error page HTML as "report.pdf" if URL is invalid
response = requests.get("https://example.com/nonexistent-file.pdf")
with open("report.pdf", "wb") as f:
f.write(response.content)

The correct approach is to always check the status before writing:

import requests

response = requests.get("https://example.com/nonexistent-file.pdf")

# ✅ Correct: check status first
if response.status_code == 200:
with open("report.pdf", "wb") as f:
f.write(response.content)
print("Download successful.")
else:
print(f"Download failed: HTTP {response.status_code}")

Output:

Download failed: HTTP 404

Downloading Multiple Files

When you need to download a batch of files, loop through the URLs and save each one:

import requests
from pathlib import Path

urls = [
"https://example.com/file1.csv",
"https://example.com/file2.csv",
"https://example.com/file3.csv",
]

output_dir = Path("downloads")
output_dir.mkdir(exist_ok=True)

for url in urls:
filename = url.split("/")[-1]
filepath = output_dir / filename

try:
response = requests.get(url, timeout=15)
response.raise_for_status()
filepath.write_bytes(response.content)
print(f"✓ {filename}")
except requests.exceptions.RequestException as e:
print(f"✗ {filename}: {e}")
tip

For downloading many files, use concurrent.futures.ThreadPoolExecutor to download multiple files in parallel, drastically reducing total time:

from concurrent.futures import ThreadPoolExecutor
import requests

def download(url):
filename = url.split("/")[-1]
response = requests.get(url, timeout=15)
response.raise_for_status()
with open(filename, "wb") as f:
f.write(response.content)
return filename

urls = ["https://example.com/file1.csv", "https://example.com/file2.csv"]

with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(download, urls)
for name in results:
print(f"Downloaded: {name}")

Comparison of Methods

MethodExternal DependencyStreamingProgress BarError HandlingBest For
requestsrequestsManualExcellentMost use cases
urllib.request.urlretrieve()NoneHook-basedBasicQuick, no-dependency scripts
urllib.request.urlopen()NoneManualManualBasicStandard library with control
wgetwgetBuilt-inMinimalSimple interactive downloads

Conclusion

Python provides multiple reliable ways to download files from URLs:

  • requests is the recommended choice for most projects: it offers a clean API, streaming support, and excellent error handling.
  • urllib.request is ideal when you need to avoid third-party dependencies and are working within the standard library.
  • wget is the simplest option with a built-in progress bar, perfect for quick scripts and interactive use.

Regardless of which method you choose, always check the HTTP status code before saving, use streaming for large files, add timeouts to prevent hanging, and implement proper error handling for production-ready code.