Skip to main content

How to Convert JSON to Pandas DataFrame in Python

JSON is the standard format for web APIs and data exchange, while Pandas DataFrames are the foundation of data analysis in Python. Converting between the two is one of the most common tasks in data engineering and analysis workflows. The right approach depends on whether your JSON is flat, nested, or stored in JSON Lines format.

In this guide, you will learn how to handle all of these scenarios with practical examples, from simple array-of-objects structures to deeply nested API responses.

Simple Flat JSON with pd.read_json()

For JSON arrays containing simple, flat objects, pd.read_json() provides the most direct conversion:

import pandas as pd
from io import StringIO

json_str = '[{"name": "Alice", "score": 90}, {"name": "Bob", "score": 85}]'

df = pd.read_json(StringIO(json_str))
print(df)

Output:

    name  score
0 Alice 90
1 Bob 85

Reading from a JSON File

import pandas as pd

# Assumes data.json contains: [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
df = pd.read_json("data.json")
print(df)

Output:

   id   name
0 1 Alice
1 2 Bob

Reading from an API Response

import pandas as pd
import requests
from io import StringIO

response = requests.get("https://api.example.com/users")

# Option 1: Parse from the response text
df = pd.read_json(StringIO(response.text))

# Option 2: Use response.json() with the DataFrame constructor
data = response.json()
df = pd.DataFrame(data)
note

In recent versions of Pandas, passing a raw JSON string directly to pd.read_json() is deprecated. Wrap the string in StringIO to avoid deprecation warnings.

Handling Nested JSON with pd.json_normalize()

Most real-world API responses contain nested objects. The pd.json_normalize() function flattens them into separate columns with dot-separated (or custom-separated) names:

import pandas as pd

data = [
{
"id": 1,
"user": {
"name": "Alice",
"location": {"city": "New York", "zip": "10001"}
}
},
{
"id": 2,
"user": {
"name": "Bob",
"location": {"city": "Los Angeles", "zip": "90001"}
}
}
]

# Flatten nested structure with underscore separator
df = pd.json_normalize(data, sep='_')

print(df.columns.tolist())
print(df)

Output:

['id', 'user_name', 'user_location_city', 'user_location_zip']
id user_name user_location_city user_location_zip
0 1 Alice New York 10001
1 2 Bob Los Angeles 90001

Every nested key is flattened into its own column, making the data ready for analysis without manual extraction.

Extracting Nested Arrays with record_path

When JSON contains arrays that you want to expand into individual rows, use the record_path parameter along with meta to carry parent fields into each row:

import pandas as pd

data = {
"company": "TechCorp",
"employees": [
{"name": "Alice", "role": "Developer"},
{"name": "Bob", "role": "Designer"}
]
}

# Expand the employees array, keeping the parent company field
df = pd.json_normalize(
data,
record_path="employees",
meta=["company"]
)

print(df)

Output:

    name       role   company
0 Alice Developer TechCorp
1 Bob Designer TechCorp

Handling Multiple Nesting Levels

For more complex structures with arrays nested inside objects, you can reference deeply nested meta fields using lists:

import pandas as pd

data = [
{
"order_id": 1,
"customer": {"name": "Alice"},
"items": [
{"product": "Book", "qty": 2},
{"product": "Pen", "qty": 5}
]
},
{
"order_id": 2,
"customer": {"name": "Bob"},
"items": [
{"product": "Laptop", "qty": 1}
]
}
]

df = pd.json_normalize(
data,
record_path="items",
meta=["order_id", ["customer", "name"]],
meta_prefix="order_"
)

print(df)

Output:

  product  qty  order_order_id order_customer.name
0 Book 2 1 Alice
1 Pen 5 1 Alice
2 Laptop 1 2 Bob

Each item gets its own row, and the parent order and customer information is duplicated across the corresponding item rows.

JSON Lines Format with lines=True

Log files and streaming data often use JSON Lines (also called NDJSON), where each line in the file is a separate, self-contained JSON object:

import pandas as pd

# events.jsonl content:
# {"id": 1, "event": "login"}
# {"id": 2, "event": "purchase"}
# {"id": 3, "event": "logout"}

df = pd.read_json("events.jsonl", lines=True)
print(df)

Output:

   id     event
0 1 login
1 2 purchase
2 3 logout

Streaming Large JSON Lines Files in Chunks

For files too large to fit in memory, use the chunksize parameter to process them incrementally:

import pandas as pd

chunks = pd.read_json("large_file.jsonl", lines=True, chunksize=10000)

for chunk in chunks:
# Each chunk is a DataFrame with up to 10,000 rows
print(f"Processing chunk with {len(chunk)} rows")
# process_data(chunk)
note

This approach keeps memory usage constant regardless of the total file size.

Handling Different JSON Structures

Dictionary of Lists (Column-Oriented)

When JSON is structured as a dictionary where each key maps to a list of values, the DataFrame constructor handles it directly:

import pandas as pd

data = {
"names": ["Alice", "Bob", "Charlie"],
"scores": [90, 85, 88],
"active": [True, False, True]
}

df = pd.DataFrame(data)
print(df)

Output:

     names  scores  active
0 Alice 90 True
1 Bob 85 False
2 Charlie 88 True

Dictionary of Dictionaries (Record-Oriented by Key)

When records are stored as values in a dictionary with meaningful keys:

import pandas as pd

data = {
"user1": {"name": "Alice", "score": 90},
"user2": {"name": "Bob", "score": 85}
}

df = pd.DataFrame.from_dict(data, orient='index')
print(df)

Output:

        name  score
user1 Alice 90
user2 Bob 85

The dictionary keys become the DataFrame index, which you can reset with df.reset_index() if needed.

Deeply Nested JSON with Custom Path Extraction

When the data you need is buried deep inside a JSON response, extract the relevant portion first and then normalize:

import pandas as pd
import json

json_str = '''
{
"response": {
"data": {
"users": [
{"id": 1, "profile": {"name": "Alice"}},
{"id": 2, "profile": {"name": "Bob"}}
]
},
"meta": {"total": 2}
}
}
'''

data = json.loads(json_str)

# Navigate to the nested array
users = data['response']['data']['users']
df = pd.json_normalize(users)

print(df)

Output:

   id profile.name
0 1 Alice
1 2 Bob

A Reusable Conversion Function

For projects that regularly parse JSON from different sources, a utility function simplifies the workflow:

import pandas as pd
import json
from io import StringIO
from pathlib import Path

def json_to_dataframe(source, nested_path=None, flatten=True, **kwargs):
"""
Convert JSON from various sources to a DataFrame.

Args:
source: A file path, JSON string, or Python dict/list.
nested_path: Dot-separated path to a nested array (e.g., "response.data.items").
flatten: Whether to flatten nested objects using json_normalize.
**kwargs: Additional arguments passed to json_normalize or DataFrame.
"""
# Load data based on source type
if isinstance(source, (dict, list)):
data = source
elif isinstance(source, (str, Path)):
path = Path(source)
if path.exists():
with open(path, encoding='utf-8') as f:
data = json.load(f)
else:
data = json.loads(source)
else:
raise ValueError(f"Unsupported source type: {type(source)}")

# Navigate to nested path if specified
if nested_path:
for key in nested_path.split('.'):
data = data[key]

# Convert to DataFrame
if flatten and isinstance(data, list):
return pd.json_normalize(data, **kwargs)
else:
return pd.DataFrame(data)

# Usage examples
df1 = json_to_dataframe("data.json")
df2 = json_to_dataframe('{"users": [{"id": 1}]}', nested_path="users")
df3 = json_to_dataframe([{"a": 1}, {"a": 2}])

Converting a DataFrame Back to JSON

Pandas provides several output orientations to match different JSON consumers:

import pandas as pd

df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'score': [90, 85]
})

# Array of records (most common for APIs)
print(df.to_json(orient='records'))

# Index-keyed objects
print(df.to_json(orient='index'))

# To a formatted JSON file
df.to_json("output.json", orient='records', indent=2)

# To JSON Lines format
df.to_json("output.jsonl", orient='records', lines=True)

Output:

[{"name":"Alice","score":90},{"name":"Bob","score":85}]
{"0":{"name":"Alice","score":90},"1":{"name":"Bob","score":85}}

The orient='records' format is the most widely compatible with web APIs and other JSON consumers.

Quick Reference

JSON StructureMethodTypical Use Case
Flat array [{...}]pd.read_json()Simple API responses
Nested objectspd.json_normalize()Complex API responses with nested fields
JSON Lines (one object per line)pd.read_json(lines=True)Log files, streaming data
Dictionary of dictionariespd.DataFrame.from_dict(orient='index')Key-indexed record collections
Deeply nested with target arrayjson.loads() then pd.json_normalize()Wrapped API responses

Conclusion

Converting JSON to a Pandas DataFrame in Python requires choosing the right method based on your data structure. Use pd.read_json() for simple, flat JSON arrays where every record shares the same keys. For the nested structures that are typical of real-world API responses, pd.json_normalize() automatically flattens nested objects into separate columns and can expand nested arrays into individual rows. For JSON Lines files, the lines=True parameter handles the format natively, with optional chunking for large files.

Best Practice

Always inspect your JSON structure before writing the conversion code. Use json.loads() to examine the data in a Python shell first, then choose between pd.read_json() for flat data and pd.json_normalize() for anything nested. This quick inspection step saves time and prevents surprises with unexpected nesting or inconsistent record structures.