How to Convert JSON to Pandas DataFrame in Python
JSON is the standard format for web APIs and data exchange, while Pandas DataFrames are the foundation of data analysis in Python. Converting between the two is one of the most common tasks in data engineering and analysis workflows. The right approach depends on whether your JSON is flat, nested, or stored in JSON Lines format.
In this guide, you will learn how to handle all of these scenarios with practical examples, from simple array-of-objects structures to deeply nested API responses.
Simple Flat JSON with pd.read_json()
For JSON arrays containing simple, flat objects, pd.read_json() provides the most direct conversion:
import pandas as pd
from io import StringIO
json_str = '[{"name": "Alice", "score": 90}, {"name": "Bob", "score": 85}]'
df = pd.read_json(StringIO(json_str))
print(df)
Output:
name score
0 Alice 90
1 Bob 85
Reading from a JSON File
import pandas as pd
# Assumes data.json contains: [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
df = pd.read_json("data.json")
print(df)
Output:
id name
0 1 Alice
1 2 Bob
Reading from an API Response
import pandas as pd
import requests
from io import StringIO
response = requests.get("https://api.example.com/users")
# Option 1: Parse from the response text
df = pd.read_json(StringIO(response.text))
# Option 2: Use response.json() with the DataFrame constructor
data = response.json()
df = pd.DataFrame(data)
In recent versions of Pandas, passing a raw JSON string directly to pd.read_json() is deprecated. Wrap the string in StringIO to avoid deprecation warnings.
Handling Nested JSON with pd.json_normalize()
Most real-world API responses contain nested objects. The pd.json_normalize() function flattens them into separate columns with dot-separated (or custom-separated) names:
import pandas as pd
data = [
{
"id": 1,
"user": {
"name": "Alice",
"location": {"city": "New York", "zip": "10001"}
}
},
{
"id": 2,
"user": {
"name": "Bob",
"location": {"city": "Los Angeles", "zip": "90001"}
}
}
]
# Flatten nested structure with underscore separator
df = pd.json_normalize(data, sep='_')
print(df.columns.tolist())
print(df)
Output:
['id', 'user_name', 'user_location_city', 'user_location_zip']
id user_name user_location_city user_location_zip
0 1 Alice New York 10001
1 2 Bob Los Angeles 90001
Every nested key is flattened into its own column, making the data ready for analysis without manual extraction.
Extracting Nested Arrays with record_path
When JSON contains arrays that you want to expand into individual rows, use the record_path parameter along with meta to carry parent fields into each row:
import pandas as pd
data = {
"company": "TechCorp",
"employees": [
{"name": "Alice", "role": "Developer"},
{"name": "Bob", "role": "Designer"}
]
}
# Expand the employees array, keeping the parent company field
df = pd.json_normalize(
data,
record_path="employees",
meta=["company"]
)
print(df)
Output:
name role company
0 Alice Developer TechCorp
1 Bob Designer TechCorp
Handling Multiple Nesting Levels
For more complex structures with arrays nested inside objects, you can reference deeply nested meta fields using lists:
import pandas as pd
data = [
{
"order_id": 1,
"customer": {"name": "Alice"},
"items": [
{"product": "Book", "qty": 2},
{"product": "Pen", "qty": 5}
]
},
{
"order_id": 2,
"customer": {"name": "Bob"},
"items": [
{"product": "Laptop", "qty": 1}
]
}
]
df = pd.json_normalize(
data,
record_path="items",
meta=["order_id", ["customer", "name"]],
meta_prefix="order_"
)
print(df)
Output:
product qty order_order_id order_customer.name
0 Book 2 1 Alice
1 Pen 5 1 Alice
2 Laptop 1 2 Bob
Each item gets its own row, and the parent order and customer information is duplicated across the corresponding item rows.
JSON Lines Format with lines=True
Log files and streaming data often use JSON Lines (also called NDJSON), where each line in the file is a separate, self-contained JSON object:
import pandas as pd
# events.jsonl content:
# {"id": 1, "event": "login"}
# {"id": 2, "event": "purchase"}
# {"id": 3, "event": "logout"}
df = pd.read_json("events.jsonl", lines=True)
print(df)
Output:
id event
0 1 login
1 2 purchase
2 3 logout
Streaming Large JSON Lines Files in Chunks
For files too large to fit in memory, use the chunksize parameter to process them incrementally:
import pandas as pd
chunks = pd.read_json("large_file.jsonl", lines=True, chunksize=10000)
for chunk in chunks:
# Each chunk is a DataFrame with up to 10,000 rows
print(f"Processing chunk with {len(chunk)} rows")
# process_data(chunk)
This approach keeps memory usage constant regardless of the total file size.
Handling Different JSON Structures
Dictionary of Lists (Column-Oriented)
When JSON is structured as a dictionary where each key maps to a list of values, the DataFrame constructor handles it directly:
import pandas as pd
data = {
"names": ["Alice", "Bob", "Charlie"],
"scores": [90, 85, 88],
"active": [True, False, True]
}
df = pd.DataFrame(data)
print(df)
Output:
names scores active
0 Alice 90 True
1 Bob 85 False
2 Charlie 88 True
Dictionary of Dictionaries (Record-Oriented by Key)
When records are stored as values in a dictionary with meaningful keys:
import pandas as pd
data = {
"user1": {"name": "Alice", "score": 90},
"user2": {"name": "Bob", "score": 85}
}
df = pd.DataFrame.from_dict(data, orient='index')
print(df)
Output:
name score
user1 Alice 90
user2 Bob 85
The dictionary keys become the DataFrame index, which you can reset with df.reset_index() if needed.
Deeply Nested JSON with Custom Path Extraction
When the data you need is buried deep inside a JSON response, extract the relevant portion first and then normalize:
import pandas as pd
import json
json_str = '''
{
"response": {
"data": {
"users": [
{"id": 1, "profile": {"name": "Alice"}},
{"id": 2, "profile": {"name": "Bob"}}
]
},
"meta": {"total": 2}
}
}
'''
data = json.loads(json_str)
# Navigate to the nested array
users = data['response']['data']['users']
df = pd.json_normalize(users)
print(df)
Output:
id profile.name
0 1 Alice
1 2 Bob
A Reusable Conversion Function
For projects that regularly parse JSON from different sources, a utility function simplifies the workflow:
import pandas as pd
import json
from io import StringIO
from pathlib import Path
def json_to_dataframe(source, nested_path=None, flatten=True, **kwargs):
"""
Convert JSON from various sources to a DataFrame.
Args:
source: A file path, JSON string, or Python dict/list.
nested_path: Dot-separated path to a nested array (e.g., "response.data.items").
flatten: Whether to flatten nested objects using json_normalize.
**kwargs: Additional arguments passed to json_normalize or DataFrame.
"""
# Load data based on source type
if isinstance(source, (dict, list)):
data = source
elif isinstance(source, (str, Path)):
path = Path(source)
if path.exists():
with open(path, encoding='utf-8') as f:
data = json.load(f)
else:
data = json.loads(source)
else:
raise ValueError(f"Unsupported source type: {type(source)}")
# Navigate to nested path if specified
if nested_path:
for key in nested_path.split('.'):
data = data[key]
# Convert to DataFrame
if flatten and isinstance(data, list):
return pd.json_normalize(data, **kwargs)
else:
return pd.DataFrame(data)
# Usage examples
df1 = json_to_dataframe("data.json")
df2 = json_to_dataframe('{"users": [{"id": 1}]}', nested_path="users")
df3 = json_to_dataframe([{"a": 1}, {"a": 2}])
Converting a DataFrame Back to JSON
Pandas provides several output orientations to match different JSON consumers:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'score': [90, 85]
})
# Array of records (most common for APIs)
print(df.to_json(orient='records'))
# Index-keyed objects
print(df.to_json(orient='index'))
# To a formatted JSON file
df.to_json("output.json", orient='records', indent=2)
# To JSON Lines format
df.to_json("output.jsonl", orient='records', lines=True)
Output:
[{"name":"Alice","score":90},{"name":"Bob","score":85}]
{"0":{"name":"Alice","score":90},"1":{"name":"Bob","score":85}}
The orient='records' format is the most widely compatible with web APIs and other JSON consumers.
Quick Reference
| JSON Structure | Method | Typical Use Case |
|---|---|---|
Flat array [{...}] | pd.read_json() | Simple API responses |
| Nested objects | pd.json_normalize() | Complex API responses with nested fields |
| JSON Lines (one object per line) | pd.read_json(lines=True) | Log files, streaming data |
| Dictionary of dictionaries | pd.DataFrame.from_dict(orient='index') | Key-indexed record collections |
| Deeply nested with target array | json.loads() then pd.json_normalize() | Wrapped API responses |
Conclusion
Converting JSON to a Pandas DataFrame in Python requires choosing the right method based on your data structure. Use pd.read_json() for simple, flat JSON arrays where every record shares the same keys. For the nested structures that are typical of real-world API responses, pd.json_normalize() automatically flattens nested objects into separate columns and can expand nested arrays into individual rows. For JSON Lines files, the lines=True parameter handles the format natively, with optional chunking for large files.
Always inspect your JSON structure before writing the conversion code. Use json.loads() to examine the data in a Python shell first, then choose between pd.read_json() for flat data and pd.json_normalize() for anything nested. This quick inspection step saves time and prevents surprises with unexpected nesting or inconsistent record structures.