How to Download Files from URLs in Python
Downloading files from the internet is one of the most common tasks in Python, whether you're building a web scraper, automating data pipelines, pulling datasets for machine learning, or simply fetching documents programmatically.
Python offers several approaches to download files from URLs, ranging from built-in standard library modules to powerful third-party packages. This guide covers the most popular methods with complete examples, error handling, and best practices to help you choose the right tool for your use case.
Method 1: Using the requests Library
The requests library is the most popular and Pythonic way to make HTTP requests. It provides a clean API, excellent error handling, and support for streaming large files.
Installation
pip install requests
Basic File Download
import requests
url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"
response = requests.get(url)
if response.status_code == 200:
with open(output_file, "wb") as file:
file.write(response.content)
print(f"File downloaded successfully: {output_file}")
else:
print(f"Download failed. Status code: {response.status_code}")
Output:
File downloaded successfully: downloaded_file.pdf
Downloading Large Files with Streaming
For large files, loading the entire response into memory is inefficient. Use stream=True to download in chunks:
import requests
url = "https://example.com/large-dataset.zip"
output_file = "dataset.zip"
response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()
with open(output_file, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
print(f"File downloaded successfully: {output_file}")
stream=True: prevents the entire file from being loaded into memory at once.iter_content(chunk_size=8192): reads the response in 8 KB chunks.raise_for_status(): raises anHTTPErrorfor 4xx/5xx responses.
Always use streaming for files larger than a few megabytes. This keeps memory usage constant regardless of file size.
Adding a Progress Indicator
When downloading large files, it's helpful to show download progress:
import requests
url = "https://example.com/large-file.zip"
output_file = "large_file.zip"
response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()
total_size = int(response.headers.get("content-length", 0))
downloaded = 0
with open(output_file, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
downloaded += len(chunk)
if total_size > 0:
percent = (downloaded / total_size) * 100
print(f"\rDownloading: {percent:.1f}%", end="")
print(f"\nDownload complete: {output_file}")
Sample output:
Downloading: 100.0%
Download complete: large_file.zip
Method 2: Using urllib.request (Standard Library)
If you want to avoid third-party dependencies, Python's built-in urllib.request module can handle file downloads without any installation.
Using urlretrieve()
The simplest approach is one line to download and save:
import urllib.request
url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"
filepath, headers = urllib.request.urlretrieve(url, output_file)
print(f"File saved to: {filepath}")
print(f"Content-Type: {headers['Content-Type']}")
Output:
File saved to: downloaded_file.pdf
Content-Type: application/pdf
Using urlopen() for More Control
For greater control over the download process, use urlopen() and write data manually:
import urllib.request
url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"
with urllib.request.urlopen(url, timeout=30) as response:
with open(output_file, "wb") as file:
# Read and write in chunks
while True:
chunk = response.read(8192)
if not chunk:
break
file.write(chunk)
print(f"File downloaded: {output_file}")
urllib.request provides less intuitive error handling compared to requests. HTTP errors must be caught with urllib.error.HTTPError, and there's no built-in equivalent to raise_for_status().
import urllib.request
import urllib.error
try:
urllib.request.urlretrieve("https://example.com/nonexistent.pdf", "file.pdf")
except urllib.error.HTTPError as e:
print(f"HTTP Error: {e.code} - {e.reason}")
except urllib.error.URLError as e:
print(f"URL Error: {e.reason}")
Method 3: Using the wget Module
The wget module provides the simplest possible interface, mimicking the behavior of the Linux wget command in a single function call.
Installation
pip install wget
Basic Usage
import wget
url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"
downloaded_path = wget.download(url, output_file)
print(f"\nFile saved to: {downloaded_path}")
Output:
100% [..........................................................................] 12345 / 12345
File saved to: downloaded_file.pdf
wget.download() automatically displays a progress bar in the terminal, making it convenient for interactive scripts.
Automatic Filename Detection
If you omit the output filename, wget extracts it from the URL:
import wget
url = "https://example.com/report-2025.pdf"
downloaded_path = wget.download(url)
print(f"\nSaved as: {downloaded_path}")
Output:
100% [..........................................................................] 54321 / 54321
Saved as: report-2025.pdf
The wget module is lightweight and great for quick scripts, but it offers limited configuration compared to requests (no custom headers, authentication, or session management).
Adding Robust Error Handling
Production code should handle network failures, invalid URLs, timeouts, and HTTP errors gracefully. Here's a reusable download function using requests:
import requests
from pathlib import Path
def download_file(url, output_path, timeout=30, chunk_size=8192):
"""Download a file from a URL with error handling."""
try:
response = requests.get(url, stream=True, timeout=timeout)
response.raise_for_status()
output = Path(output_path)
output.parent.mkdir(parents=True, exist_ok=True)
with open(output, "wb") as file:
for chunk in response.iter_content(chunk_size=chunk_size):
file.write(chunk)
print(f"Downloaded: {output} ({output.stat().st_size:,} bytes)")
return True
except requests.exceptions.MissingSchema:
print(f"Invalid URL (missing http/https): {url}")
except requests.exceptions.ConnectionError:
print(f"Connection failed: {url}")
except requests.exceptions.Timeout:
print(f"Request timed out after {timeout}s: {url}")
except requests.exceptions.HTTPError as e:
print(f"HTTP error {e.response.status_code}: {url}")
return False
# Usage
download_file("https://example.com/data.csv", "downloads/data.csv")
download_file("https://invalid-domain.xyz/file.txt", "fail.txt")
Output:
Downloaded: downloads/data.csv (1,234 bytes)
Connection failed: https://invalid-domain.xyz/file.txt
Common Mistake: Not Checking the Response Before Saving
A frequent error is writing the response to disk without verifying the HTTP status code. If the server returns a 404 or 500 error, you'll end up saving an HTML error page instead of the actual file:
import requests
# ❌ Wrong: saves error page HTML as "report.pdf" if URL is invalid
response = requests.get("https://example.com/nonexistent-file.pdf")
with open("report.pdf", "wb") as f:
f.write(response.content)
The correct approach is to always check the status before writing:
import requests
response = requests.get("https://example.com/nonexistent-file.pdf")
# ✅ Correct: check status first
if response.status_code == 200:
with open("report.pdf", "wb") as f:
f.write(response.content)
print("Download successful.")
else:
print(f"Download failed: HTTP {response.status_code}")
Output:
Download failed: HTTP 404
Downloading Multiple Files
When you need to download a batch of files, loop through the URLs and save each one:
import requests
from pathlib import Path
urls = [
"https://example.com/file1.csv",
"https://example.com/file2.csv",
"https://example.com/file3.csv",
]
output_dir = Path("downloads")
output_dir.mkdir(exist_ok=True)
for url in urls:
filename = url.split("/")[-1]
filepath = output_dir / filename
try:
response = requests.get(url, timeout=15)
response.raise_for_status()
filepath.write_bytes(response.content)
print(f"✓ {filename}")
except requests.exceptions.RequestException as e:
print(f"✗ {filename}: {e}")
For downloading many files, use concurrent.futures.ThreadPoolExecutor to download multiple files in parallel, drastically reducing total time:
from concurrent.futures import ThreadPoolExecutor
import requests
def download(url):
filename = url.split("/")[-1]
response = requests.get(url, timeout=15)
response.raise_for_status()
with open(filename, "wb") as f:
f.write(response.content)
return filename
urls = ["https://example.com/file1.csv", "https://example.com/file2.csv"]
with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(download, urls)
for name in results:
print(f"Downloaded: {name}")
Comparison of Methods
| Method | External Dependency | Streaming | Progress Bar | Error Handling | Best For |
|---|---|---|---|---|---|
requests | requests | ✅ | Manual | Excellent | Most use cases |
urllib.request.urlretrieve() | None | ❌ | Hook-based | Basic | Quick, no-dependency scripts |
urllib.request.urlopen() | None | Manual | Manual | Basic | Standard library with control |
wget | wget | ✅ | Built-in | Minimal | Simple interactive downloads |
Conclusion
Python provides multiple reliable ways to download files from URLs:
requestsis the recommended choice for most projects: it offers a clean API, streaming support, and excellent error handling.urllib.requestis ideal when you need to avoid third-party dependencies and are working within the standard library.wgetis the simplest option with a built-in progress bar, perfect for quick scripts and interactive use.
Regardless of which method you choose, always check the HTTP status code before saving, use streaming for large files, add timeouts to prevent hanging, and implement proper error handling for production-ready code.