How to Check JSON Integrity in Python
In data processing, ensuring JSON integrity is crucial. A program usually needs to verify two things: Syntax Validity (is the string valid JSON?) and Structural Integrity (does the JSON contain the required keys and correct data types?). Without these checks, applications can crash unexpectedly when processing API responses or configuration files.
This guide explores methods to validate and verify JSON data structures using Python's built-in libraries and the powerful jsonschema library.
Understanding JSON Integrity
JSON integrity involves checking data at multiple levels:
- Syntax: Is the text formatted correctly? (e.g., closing braces, quoted keys).
- Structure (Schema): Does the JSON object have the expected fields? Are the values the correct type (e.g.,
ageis a number, not a string)? - Logic: Do the values make sense? (e.g.,
age> 0).
Method 1: Checking Syntax Validity (Parsing)
The most basic check is determining if a string can be parsed into a Python dictionary. Python's built-in json module handles this.
Basic Syntax Check: we use a try-except block to catch json.JSONDecodeError.
import json
def check_syntax(json_input):
try:
# ✅ Correct: Attempt to parse the string
data = json.loads(json_input)
print("Syntax is Valid.")
return True
except json.JSONDecodeError as e:
# ⛔️ Error: Catch malformed JSON
print(f"Invalid JSON Syntax: {e}")
return False
# Test Case 1: Valid JSON
valid_json = '{"name": "Alice", "age": 30}'
check_syntax(valid_json)
# Test Case 2: Invalid JSON (Missing closing brace)
invalid_json = '{"name": "Alice", "age": 30'
check_syntax(invalid_json)
Output:
Syntax is Valid.
Invalid JSON Syntax: Expecting ',' delimiter: line 1 column 28 (char 27)
If you are reading from a file, use json.load(file_object) instead of json.loads(string). The error handling remains the same.
Method 2: Checking Structural Integrity (Schema Validation)
Valid syntax does not guarantee the data is usable. For example, {"age": "thirty"} is valid JSON syntax, but your code might crash if it expects an integer.
To enforce data types and required fields, use the jsonschema library.
Installation:
pip install jsonschema
Implementing Schema Validation
Define a schema that describes your expected data structure.
import json
import jsonschema
from jsonschema import validate
# 1. Define the Schema
user_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"email": {"type": "string"}
},
"required": ["name", "age"] # 'email' is optional
}
def validate_user_data(json_data):
try:
# Parse first
data = json.loads(json_data)
# ✅ Correct: Validate against the schema
validate(instance=data, schema=user_schema)
print("Data is valid and conforms to schema.")
except json.JSONDecodeError:
print("Error: Invalid JSON syntax.")
except jsonschema.exceptions.ValidationError as e:
print(f"Schema Error: {e.message}")
# Test Case: Missing 'age' (Schema violation)
bad_structure = '{"name": "Bob", "email": "bob@example.com"}'
validate_user_data(bad_structure)
# Test Case: Wrong type for 'age'
bad_type = '{"name": "Bob", "age": "twenty"}'
validate_user_data(bad_type)
Output:
Schema Error: 'age' is a required property
Schema Error: 'twenty' is not of type 'number'
Using jsonschema is significantly more robust and readable than writing dozens of if isinstance(...) checks manually.
Method 3: Custom Logical Validation
Sometimes data is syntactically correct and fits the schema, but is logically invalid (e.g., an age of -5). You can combine parsing with custom logic.
import json
def process_secure_data(json_input):
try:
data = json.loads(json_input)
# 1. Check required keys
if "age" not in data:
raise ValueError("Missing key: 'age'")
# 2. Check Logical constraints
if data["age"] < 0 or data["age"] > 120:
raise ValueError(f"Invalid age range: {data['age']}")
print(f"Processing user age: {data['age']}")
except (json.JSONDecodeError, ValueError) as e:
print(f"Integrity Check Failed: {e}")
# Test Case: Logical Error
process_secure_data('{"name": "Alice", "age": -5}')
Output:
Integrity Check Failed: Invalid age range: -5
Conclusion
To ensure JSON integrity in Python:
- Use
json.loads()inside atry-exceptblock to catch syntax errors (JSONDecodeError). - Use
jsonschemato enforce structure, required fields, and data types automatically. - Add Custom Logic for business rules that cannot be defined by types alone (e.g., value ranges).