How to Decode URL Parameters in Python
URL encoding (also known as percent-encoding) replaces unsafe ASCII characters with a % followed by two hexadecimal digits.
This guide explains how to decode URL parameters in Python, effectively reversing this process. We'll focus on the urllib.parse.unquote() and urllib.parse.unquote_plus() functions, handle double-encoding, and briefly touch on using the requests library.
Decoding URL Parameters with urllib.parse.unquote()
The urllib.parse.unquote() function is the standard and recommended way to decode URL-encoded strings in Python:
from urllib.parse import unquote
url = 'https://tutorialreference.com/doc%3Fpage%3D1%26offset%3D10'
decoded_url = unquote(url)
print(decoded_url) # Output: https://tutorialreference.com/doc?page=1&offset=10
unquote(url)replaces%xxescapes with their single-character equivalent. For example,%3Fbecomes?,%3Dbecomes=, and%26becomes&.- The
unquotemethod handles UTF-8 encoding.
Handling Plus Signs (+) as Spaces with unquote_plus()
In HTML form encoding, spaces are often represented by plus signs (+). urllib.parse.unquote() does not automatically convert + to space. For this, use urllib.parse.unquote_plus():
from urllib.parse import unquote_plus
url = 'https://tutorialreference.com/doc%3Fpage%3D1+%26+offset%3D10' # + instead of space
result = unquote_plus(url, encoding='utf-8')
print(result) # Output: https://tutorialreference.com/doc?page=1 & offset=10
unquote_plus()behaves likeunquote(), but also replaces plus signs with spaces. This is crucial for correctly decoding form data.
Decoding Double-Encoded Parameters
Sometimes, parameters might be encoded twice. In these cases, you need to call unquote() (or unquote_plus()) twice:
from urllib.parse import unquote
url = 'https://tutorialreference.com/doc%253Fpage%253D1%2526offset%253D10'
result = unquote(unquote(url)) # Call unquote() twice
print(result) # Output: https://tutorialreference.com/doc?page=1&offset=10
- Each call to
unquote()decodes one level of encoding.
Using requests.utils.unquote() (If you already have requests)
If you have installed requests, you can use the requests.utils.unquote() method.
import requests
url = 'https://tutorialreference.com/doc%3Fpage%3D1%26offset%3D10'
result = requests.utils.unquote(url)
print(result)
- The
requests.utils.unquotedecodes the string by replacing the%xxwith their corresponding character.