How to Convert Escaped String to JSON in Python
APIs and web services often return JSON that has been "double-encoded" or escaped within another string. This commonly occurs in nested API responses, AWS Lambda logs, HTML attributes, or when JSON is embedded in other formats.
This guide covers techniques for unescaping and parsing these problematic JSON strings.
Standard JSON String Parsing
For regular JSON strings, use json.loads() directly:
import json
# Standard JSON string
raw_string = '{"user": "Alice", "count": 5, "active": true}'
# Parse to Python dictionary
data = json.loads(raw_string)
print(data['user']) # Alice
print(data['count']) # 5
print(type(data)) # <class 'dict'>
Double-Escaped JSON
When JSON is stringified inside another JSON string, you'll see double backslashes. This requires parsing twice:
import json
# Double-escaped: JSON string inside a JSON string
payload = '"{\\"id\\": 123, \\"name\\": \\"test\\", \\"active\\": true}"'
# Step 1: First parse removes outer quotes and unescapes
intermediate = json.loads(payload)
print(intermediate) # {"id": 123, "name": "test", "active": true}
print(type(intermediate)) # <class 'str'> - Still a string!
# Step 2: Second parse converts to dictionary
final_data = json.loads(intermediate)
print(final_data['id']) # 123
print(type(final_data)) # <class 'dict'>
Detecting Double-Encoded JSON
import json
def parse_json_flexible(data):
"""Parse JSON, handling double-encoding automatically."""
if isinstance(data, str):
parsed = json.loads(data)
# If result is still a string, it was double-encoded
if isinstance(parsed, str):
return json.loads(parsed)
return parsed
return data
# Works for both cases
normal = '{"key": "value"}'
double = '"{\\"key\\": \\"value\\"}"'
print(parse_json_flexible(normal)) # {'key': 'value'}
print(parse_json_flexible(double)) # {'key': 'value'}
HTML Entity Escaped JSON
JSON embedded in HTML attributes uses HTML entities like " for quotes:
import html
import json
# From an HTML data attribute
meta_content = '{"lat": 40.7128, "lon": -74.0060, "city": "New York"}'
# Step 1: Unescape HTML entities
clean_json = html.unescape(meta_content)
print(clean_json)
# Output: {"lat": 40.7128, "lon": -74.0060, "city": "New York"}
# Step 2: Parse JSON
location = json.loads(clean_json)
print(f"City: {location['city']}")
# Output: City: New York
Common HTML Entities in JSON
| Entity | Character | Description |
|---|---|---|
" | " | Double quote |
' | ' | Single quote |
& | & | Ampersand |
< | < | Less than |
> | > | Greater than |
Unicode Escape Sequences
JSON may contain Unicode escape sequences (\uXXXX):
import json
# Unicode escapes in JSON
escaped = '{"message": "Hello \\u0048\\u0065\\u006c\\u006c\\u006f", "emoji": "\\ud83d\\ude00"}'
# json.loads handles Unicode automatically
data = json.loads(escaped)
print(data['message']) # Hello Hello
print(data['emoji']) # 😀
URL-Encoded JSON
JSON passed in URLs or query strings may be URL-encoded:
from urllib.parse import unquote
import json
# URL-encoded JSON
url_encoded = '%7B%22user%22%3A%20%22Alice%22%2C%20%22id%22%3A%20123%7D'
# Step 1: URL decode
decoded = unquote(url_encoded)
print(decoded)
# Output: {"user": "Alice", "id": 123}
# Step 2: Parse JSON
data = json.loads(decoded)
print(data['user'])
# Output: Alice
Python String Literals (ast.literal_eval)
Sometimes you receive Python dictionary syntax instead of JSON:
import ast
import json
# Python dict syntax (single quotes, True instead of true)
python_literal = "{'name': 'Alice', 'active': True, 'count': None}"
# ast.literal_eval safely parses Python literals
data = ast.literal_eval(python_literal)
print(data['name']) # Output: Alice
print(data['active']) # Output: True
# Note: json.loads would fail on this
# json.loads(python_literal) # JSONDecodeError
Never use eval() to parse untrusted strings. Always use ast.literal_eval() for Python literals or json.loads() for JSON, both are safe against code injection.
Comprehensive Parsing Function
import json
import html
from urllib.parse import unquote
import ast
def parse_escaped_json(data, max_depth=3):
"""
Parse JSON from various escaped formats.
Handles:
- Standard JSON strings
- Double/triple encoded JSON
- HTML entity escaped JSON
- URL encoded JSON
- Python literal syntax
"""
if not isinstance(data, str):
return data
original = data
# Try URL decoding if it looks URL-encoded
if '%' in data:
data = unquote(data)
# Unescape HTML entities
if '&' in data and ';' in data:
data = html.unescape(data)
# Try parsing, handling multiple levels of encoding
for _ in range(max_depth):
try:
parsed = json.loads(data)
# If result is a string, might be double-encoded
if isinstance(parsed, str):
data = parsed
continue
return parsed
except json.JSONDecodeError:
break
# Fall back to ast.literal_eval for Python syntax
try:
return ast.literal_eval(original)
except (ValueError, SyntaxError):
pass
raise ValueError(f"Could not parse: {original[:100]}...")
# Test cases
test_cases = [
'{"key": "value"}', # Standard
'"{\\"key\\": \\"value\\"}"', # Double-encoded
'{"key": "value"}', # HTML entities
'%7B%22key%22%3A%20%22value%22%7D', # URL-encoded
"{'key': 'value'}", # Python literal
]
for test in test_cases:
result = parse_escaped_json(test)
print(f"Input type -> {type(result).__name__}: {result}")
Output:
Input type -> dict: {'key': 'value'}
Input type -> dict: {'key': 'value'}
Input type -> dict: {'key': 'value'}
Input type -> dict: {'key': 'value'}
Input type -> dict: {'key': 'value'}
Debugging Escaped Strings
When you're unsure why parsing fails, inspect the raw string:
import json
problematic = '{"message": "Hello\\nWorld"}'
# View the actual characters
print(repr(problematic))
# '{"message": "Hello\\nWorld"}'
# Check each character
print([char for char in problematic[:20]])
# Try parsing with helpful error
try:
data = json.loads(problematic)
except json.JSONDecodeError as e:
print(f"Error at position {e.pos}: {e.msg}")
print(f"Context: ...{problematic[max(0,e.pos-10):e.pos+10]}...")
Output:
'{"message": "Hello\\nWorld"}'
['{', '"', 'm', 'e', 's', 's', 'a', 'g', 'e', '"', ':', ' ', '"', 'H', 'e', 'l', 'l', 'o', '\\', 'n']
Summary
| Input Format | Solution | Example |
|---|---|---|
{"key": "val"} | json.loads(s) | Standard JSON |
"{\"key\": ...}" | json.loads() twice | Double-encoded |
{"key"} | html.unescape() + json.loads() | HTML entities |
%7B%22key%22... | unquote() + json.loads() | URL-encoded |
{'key': 'val'} | ast.literal_eval(s) | Python literal |
When JSON parsing fails unexpectedly, use print(repr(your_string)) to see the actual escape characters. This reveals whether you're dealing with \" (escaped quotes), \\ (escaped backslashes), or other encoding issues.