How to Download Files from URLs in Python

Downloading files from the internet is one of the most common tasks in Python, whether you're building a web scraper, automating data pipelines, pulling datasets for machine learning, or simply fetching documents programmatically.

Python offers several approaches to download files from URLs, ranging from built-in standard library modules to powerful third-party packages. This guide covers the most popular methods with complete examples, error handling, and best practices to help you choose the right tool for your use case.

Method 1: Using the `requests` Library

The requests library is the most popular and Pythonic way to make HTTP requests. It provides a clean API, excellent error handling, and support for streaming large files.

Installation

pip install requests

Basic File Download

import requests

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

response = requests.get(url)

if response.status_code == 200:
    with open(output_file, "wb") as file:
        file.write(response.content)
    print(f"File downloaded successfully: {output_file}")
else:
    print(f"Download failed. Status code: {response.status_code}")

Output:

File downloaded successfully: downloaded_file.pdf

Downloading Large Files with Streaming

For large files, loading the entire response into memory is inefficient. Use stream=True to download in chunks:

import requests

url = "https://example.com/large-dataset.zip"
output_file = "dataset.zip"

response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()

with open(output_file, "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)

print(f"File downloaded successfully: {output_file}")

stream=True: prevents the entire file from being loaded into memory at once.
iter_content(chunk_size=8192): reads the response in 8 KB chunks.
raise_for_status(): raises an HTTPError for 4xx/5xx responses.

tip

Always use streaming for files larger than a few megabytes. This keeps memory usage constant regardless of file size.

Adding a Progress Indicator

When downloading large files, it's helpful to show download progress:

import requests

url = "https://example.com/large-file.zip"
output_file = "large_file.zip"

response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()

total_size = int(response.headers.get("content-length", 0))
downloaded = 0

with open(output_file, "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)
        downloaded += len(chunk)
        if total_size > 0:
            percent = (downloaded / total_size) * 100
            print(f"\rDownloading: {percent:.1f}%", end="")

print(f"\nDownload complete: {output_file}")

Sample output:

Downloading: 100.0%
Download complete: large_file.zip

Method 2: Using `urllib.request` (Standard Library)

If you want to avoid third-party dependencies, Python's built-in urllib.request module can handle file downloads without any installation.

Using `urlretrieve()`

The simplest approach is one line to download and save:

import urllib.request

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

filepath, headers = urllib.request.urlretrieve(url, output_file)

print(f"File saved to: {filepath}")
print(f"Content-Type: {headers['Content-Type']}")

Output:

File saved to: downloaded_file.pdf
Content-Type: application/pdf

Using `urlopen()` for More Control

For greater control over the download process, use urlopen() and write data manually:

import urllib.request

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

with urllib.request.urlopen(url, timeout=30) as response:
    with open(output_file, "wb") as file:
        # Read and write in chunks
        while True:
            chunk = response.read(8192)
            if not chunk:
                break
            file.write(chunk)

print(f"File downloaded: {output_file}")

warning

urllib.request provides less intuitive error handling compared to requests. HTTP errors must be caught with urllib.error.HTTPError, and there's no built-in equivalent to raise_for_status().

import urllib.request
import urllib.error

try:
    urllib.request.urlretrieve("https://example.com/nonexistent.pdf", "file.pdf")
except urllib.error.HTTPError as e:
    print(f"HTTP Error: {e.code} - {e.reason}")
except urllib.error.URLError as e:
    print(f"URL Error: {e.reason}")

Method 3: Using the `wget` Module

The wget module provides the simplest possible interface, mimicking the behavior of the Linux wget command in a single function call.

Installation

pip install wget

Basic Usage

import wget

url = "https://example.com/sample-file.pdf"
output_file = "downloaded_file.pdf"

downloaded_path = wget.download(url, output_file)

print(f"\nFile saved to: {downloaded_path}")

Output:

100% [..........................................................................] 12345 / 12345
File saved to: downloaded_file.pdf

wget.download() automatically displays a progress bar in the terminal, making it convenient for interactive scripts.

Automatic Filename Detection

If you omit the output filename, wget extracts it from the URL:

import wget

url = "https://example.com/report-2025.pdf"
downloaded_path = wget.download(url)

print(f"\nSaved as: {downloaded_path}")

Output:

100% [..........................................................................] 54321 / 54321
Saved as: report-2025.pdf

note

The wget module is lightweight and great for quick scripts, but it offers limited configuration compared to requests (no custom headers, authentication, or session management).

Adding Robust Error Handling

Production code should handle network failures, invalid URLs, timeouts, and HTTP errors gracefully. Here's a reusable download function using requests:

import requests
from pathlib import Path

def download_file(url, output_path, timeout=30, chunk_size=8192):
    """Download a file from a URL with error handling."""
    try:
        response = requests.get(url, stream=True, timeout=timeout)
        response.raise_for_status()

        output = Path(output_path)
        output.parent.mkdir(parents=True, exist_ok=True)

        with open(output, "wb") as file:
            for chunk in response.iter_content(chunk_size=chunk_size):
                file.write(chunk)

        print(f"Downloaded: {output} ({output.stat().st_size:,} bytes)")
        return True

    except requests.exceptions.MissingSchema:
        print(f"Invalid URL (missing http/https): {url}")
    except requests.exceptions.ConnectionError:
        print(f"Connection failed: {url}")
    except requests.exceptions.Timeout:
        print(f"Request timed out after {timeout}s: {url}")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error {e.response.status_code}: {url}")

    return False


# Usage
download_file("https://example.com/data.csv", "downloads/data.csv")
download_file("https://invalid-domain.xyz/file.txt", "fail.txt")

Output:

Downloaded: downloads/data.csv (1,234 bytes)
Connection failed: https://invalid-domain.xyz/file.txt

Common Mistake: Not Checking the Response Before Saving

A frequent error is writing the response to disk without verifying the HTTP status code. If the server returns a 404 or 500 error, you'll end up saving an HTML error page instead of the actual file:

import requests

# ❌ Wrong: saves error page HTML as "report.pdf" if URL is invalid
response = requests.get("https://example.com/nonexistent-file.pdf")
with open("report.pdf", "wb") as f:
    f.write(response.content)

The correct approach is to always check the status before writing:

import requests

response = requests.get("https://example.com/nonexistent-file.pdf")

# ✅ Correct: check status first
if response.status_code == 200:
    with open("report.pdf", "wb") as f:
        f.write(response.content)
    print("Download successful.")
else:
    print(f"Download failed: HTTP {response.status_code}")

Output:

Download failed: HTTP 404

Downloading Multiple Files

When you need to download a batch of files, loop through the URLs and save each one:

import requests
from pathlib import Path

urls = [
    "https://example.com/file1.csv",
    "https://example.com/file2.csv",
    "https://example.com/file3.csv",
]

output_dir = Path("downloads")
output_dir.mkdir(exist_ok=True)

for url in urls:
    filename = url.split("/")[-1]
    filepath = output_dir / filename

    try:
        response = requests.get(url, timeout=15)
        response.raise_for_status()
        filepath.write_bytes(response.content)
        print(f"✓ {filename}")
    except requests.exceptions.RequestException as e:
        print(f"✗ {filename}: {e}")

tip

For downloading many files, use concurrent.futures.ThreadPoolExecutor to download multiple files in parallel, drastically reducing total time:

from concurrent.futures import ThreadPoolExecutor
import requests

def download(url):
    filename = url.split("/")[-1]
    response = requests.get(url, timeout=15)
    response.raise_for_status()
    with open(filename, "wb") as f:
        f.write(response.content)
    return filename

urls = ["https://example.com/file1.csv", "https://example.com/file2.csv"]

with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(download, urls)
    for name in results:
        print(f"Downloaded: {name}")

Comparison of Methods

Method	External Dependency	Streaming	Progress Bar	Error Handling	Best For
`requests`	`requests`	✅	Manual	Excellent	Most use cases
`urllib.request.urlretrieve()`	None	❌	Hook-based	Basic	Quick, no-dependency scripts
`urllib.request.urlopen()`	None	Manual	Manual	Basic	Standard library with control
`wget`	`wget`	✅	Built-in	Minimal	Simple interactive downloads

Conclusion

Python provides multiple reliable ways to download files from URLs:

requests is the recommended choice for most projects: it offers a clean API, streaming support, and excellent error handling.
urllib.request is ideal when you need to avoid third-party dependencies and are working within the standard library.
wget is the simplest option with a built-in progress bar, perfect for quick scripts and interactive use.

Regardless of which method you choose, always check the HTTP status code before saving, use streaming for large files, add timeouts to prevent hanging, and implement proper error handling for production-ready code.

Method 1: Using the requests Library​

Installation​

Basic File Download​

Downloading Large Files with Streaming​

Adding a Progress Indicator​

Method 2: Using urllib.request (Standard Library)​

Using urlretrieve()​

Using urlopen() for More Control​

Method 3: Using the wget Module​

Installation​

Basic Usage​

Automatic Filename Detection​

Adding Robust Error Handling​

Common Mistake: Not Checking the Response Before Saving​

Downloading Multiple Files​

Comparison of Methods​

Conclusion​

Table of Contents

Method 1: Using the `requests` Library

Installation

Basic File Download

Downloading Large Files with Streaming

Adding a Progress Indicator

Method 2: Using `urllib.request` (Standard Library)

Using `urlretrieve()`

Using `urlopen()` for More Control

Method 3: Using the `wget` Module

Installation

Basic Usage

Automatic Filename Detection

Adding Robust Error Handling

Common Mistake: Not Checking the Response Before Saving

Downloading Multiple Files

Comparison of Methods

Conclusion