How to Convert JSON to Pandas DataFrame in Python

JSON is the standard format for web APIs and data exchange, while Pandas DataFrames are the foundation of data analysis in Python. Converting between the two is one of the most common tasks in data engineering and analysis workflows. The right approach depends on whether your JSON is flat, nested, or stored in JSON Lines format.

In this guide, you will learn how to handle all of these scenarios with practical examples, from simple array-of-objects structures to deeply nested API responses.

Simple Flat JSON with `pd.read_json()`

For JSON arrays containing simple, flat objects, pd.read_json() provides the most direct conversion:

import pandas as pd
from io import StringIO

json_str = '[{"name": "Alice", "score": 90}, {"name": "Bob", "score": 85}]'

df = pd.read_json(StringIO(json_str))
print(df)

Output:

    name  score
0  Alice     90
1    Bob     85

Reading from a JSON File

import pandas as pd

# Assumes data.json contains: [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
df = pd.read_json("data.json")
print(df)

Output:

   id   name
0   1  Alice
1   2    Bob

Reading from an API Response

import pandas as pd
import requests
from io import StringIO

response = requests.get("https://api.example.com/users")

# Option 1: Parse from the response text
df = pd.read_json(StringIO(response.text))

# Option 2: Use response.json() with the DataFrame constructor
data = response.json()
df = pd.DataFrame(data)

note

In recent versions of Pandas, passing a raw JSON string directly to pd.read_json() is deprecated. Wrap the string in StringIO to avoid deprecation warnings.

Handling Nested JSON with `pd.json_normalize()`

Most real-world API responses contain nested objects. The pd.json_normalize() function flattens them into separate columns with dot-separated (or custom-separated) names:

import pandas as pd

data = [
    {
        "id": 1,
        "user": {
            "name": "Alice",
            "location": {"city": "New York", "zip": "10001"}
        }
    },
    {
        "id": 2,
        "user": {
            "name": "Bob",
            "location": {"city": "Los Angeles", "zip": "90001"}
        }
    }
]

# Flatten nested structure with underscore separator
df = pd.json_normalize(data, sep='_')

print(df.columns.tolist())
print(df)

Output:

['id', 'user_name', 'user_location_city', 'user_location_zip']
   id user_name user_location_city user_location_zip
0   1     Alice           New York             10001
1   2       Bob        Los Angeles             90001

Every nested key is flattened into its own column, making the data ready for analysis without manual extraction.

Extracting Nested Arrays with `record_path`

When JSON contains arrays that you want to expand into individual rows, use the record_path parameter along with meta to carry parent fields into each row:

import pandas as pd

data = {
    "company": "TechCorp",
    "employees": [
        {"name": "Alice", "role": "Developer"},
        {"name": "Bob", "role": "Designer"}
    ]
}

# Expand the employees array, keeping the parent company field
df = pd.json_normalize(
    data,
    record_path="employees",
    meta=["company"]
)

print(df)

Output:

    name       role   company
0  Alice  Developer  TechCorp
1    Bob   Designer  TechCorp

Handling Multiple Nesting Levels

For more complex structures with arrays nested inside objects, you can reference deeply nested meta fields using lists:

import pandas as pd

data = [
    {
        "order_id": 1,
        "customer": {"name": "Alice"},
        "items": [
            {"product": "Book", "qty": 2},
            {"product": "Pen", "qty": 5}
        ]
    },
    {
        "order_id": 2,
        "customer": {"name": "Bob"},
        "items": [
            {"product": "Laptop", "qty": 1}
        ]
    }
]

df = pd.json_normalize(
    data,
    record_path="items",
    meta=["order_id", ["customer", "name"]],
    meta_prefix="order_"
)

print(df)

Output:

  product  qty  order_order_id order_customer.name
  Book    2               1               Alice
   Pen    5               1               Alice
Laptop    1               2                 Bob

Each item gets its own row, and the parent order and customer information is duplicated across the corresponding item rows.

JSON Lines Format with `lines=True`

Log files and streaming data often use JSON Lines (also called NDJSON), where each line in the file is a separate, self-contained JSON object:

import pandas as pd

# events.jsonl content:
# {"id": 1, "event": "login"}
# {"id": 2, "event": "purchase"}
# {"id": 3, "event": "logout"}

df = pd.read_json("events.jsonl", lines=True)
print(df)

Output:

   id     event
 1     login
 2  purchase
 3    logout

Streaming Large JSON Lines Files in Chunks

For files too large to fit in memory, use the chunksize parameter to process them incrementally:

import pandas as pd

chunks = pd.read_json("large_file.jsonl", lines=True, chunksize=10000)

for chunk in chunks:
    # Each chunk is a DataFrame with up to 10,000 rows
    print(f"Processing chunk with {len(chunk)} rows")
    # process_data(chunk)

note

This approach keeps memory usage constant regardless of the total file size.

Handling Different JSON Structures

Dictionary of Lists (Column-Oriented)

When JSON is structured as a dictionary where each key maps to a list of values, the DataFrame constructor handles it directly:

import pandas as pd

data = {
    "names": ["Alice", "Bob", "Charlie"],
    "scores": [90, 85, 88],
    "active": [True, False, True]
}

df = pd.DataFrame(data)
print(df)

Output:

     names  scores  active
  Alice      90    True
    Bob      85   False
Charlie      88    True

Dictionary of Dictionaries (Record-Oriented by Key)

When records are stored as values in a dictionary with meaningful keys:

import pandas as pd

data = {
    "user1": {"name": "Alice", "score": 90},
    "user2": {"name": "Bob", "score": 85}
}

df = pd.DataFrame.from_dict(data, orient='index')
print(df)

Output:

        name  score
user1  Alice     90
user2    Bob     85

The dictionary keys become the DataFrame index, which you can reset with df.reset_index() if needed.

Deeply Nested JSON with Custom Path Extraction

When the data you need is buried deep inside a JSON response, extract the relevant portion first and then normalize:

import pandas as pd
import json

json_str = '''
{
    "response": {
        "data": {
            "users": [
                {"id": 1, "profile": {"name": "Alice"}},
                {"id": 2, "profile": {"name": "Bob"}}
            ]
        },
        "meta": {"total": 2}
    }
}
'''

data = json.loads(json_str)

# Navigate to the nested array
users = data['response']['data']['users']
df = pd.json_normalize(users)

print(df)

Output:

   id profile.name
0   1        Alice
1   2          Bob

A Reusable Conversion Function

For projects that regularly parse JSON from different sources, a utility function simplifies the workflow:

import pandas as pd
import json
from io import StringIO
from pathlib import Path

def json_to_dataframe(source, nested_path=None, flatten=True, **kwargs):
    """
    Convert JSON from various sources to a DataFrame.

    Args:
        source: A file path, JSON string, or Python dict/list.
        nested_path: Dot-separated path to a nested array (e.g., "response.data.items").
        flatten: Whether to flatten nested objects using json_normalize.
        **kwargs: Additional arguments passed to json_normalize or DataFrame.
    """
    # Load data based on source type
    if isinstance(source, (dict, list)):
        data = source
    elif isinstance(source, (str, Path)):
        path = Path(source)
        if path.exists():
            with open(path, encoding='utf-8') as f:
                data = json.load(f)
        else:
            data = json.loads(source)
    else:
        raise ValueError(f"Unsupported source type: {type(source)}")

    # Navigate to nested path if specified
    if nested_path:
        for key in nested_path.split('.'):
            data = data[key]

    # Convert to DataFrame
    if flatten and isinstance(data, list):
        return pd.json_normalize(data, **kwargs)
    else:
        return pd.DataFrame(data)

# Usage examples
df1 = json_to_dataframe("data.json")
df2 = json_to_dataframe('{"users": [{"id": 1}]}', nested_path="users")
df3 = json_to_dataframe([{"a": 1}, {"a": 2}])

Converting a DataFrame Back to JSON

Pandas provides several output orientations to match different JSON consumers:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'score': [90, 85]
})

# Array of records (most common for APIs)
print(df.to_json(orient='records'))

# Index-keyed objects
print(df.to_json(orient='index'))

# To a formatted JSON file
df.to_json("output.json", orient='records', indent=2)

# To JSON Lines format
df.to_json("output.jsonl", orient='records', lines=True)

Output:

[{"name":"Alice","score":90},{"name":"Bob","score":85}]
{"0":{"name":"Alice","score":90},"1":{"name":"Bob","score":85}}

The orient='records' format is the most widely compatible with web APIs and other JSON consumers.

Quick Reference

JSON Structure	Method	Typical Use Case
Flat array `[{...}]`	`pd.read_json()`	Simple API responses
Nested objects	`pd.json_normalize()`	Complex API responses with nested fields
JSON Lines (one object per line)	`pd.read_json(lines=True)`	Log files, streaming data
Dictionary of dictionaries	`pd.DataFrame.from_dict(orient='index')`	Key-indexed record collections
Deeply nested with target array	`json.loads()` then `pd.json_normalize()`	Wrapped API responses

Conclusion

Converting JSON to a Pandas DataFrame in Python requires choosing the right method based on your data structure. Use pd.read_json() for simple, flat JSON arrays where every record shares the same keys. For the nested structures that are typical of real-world API responses, pd.json_normalize() automatically flattens nested objects into separate columns and can expand nested arrays into individual rows. For JSON Lines files, the lines=True parameter handles the format natively, with optional chunking for large files.

Best Practice

Always inspect your JSON structure before writing the conversion code. Use json.loads() to examine the data in a Python shell first, then choose between pd.read_json() for flat data and pd.json_normalize() for anything nested. This quick inspection step saves time and prevents surprises with unexpected nesting or inconsistent record structures.

Simple Flat JSON with pd.read_json()​

Reading from a JSON File​

Reading from an API Response​

Handling Nested JSON with pd.json_normalize()​

Extracting Nested Arrays with record_path​

Handling Multiple Nesting Levels​

JSON Lines Format with lines=True​

Streaming Large JSON Lines Files in Chunks​

Handling Different JSON Structures​

Dictionary of Lists (Column-Oriented)​

Dictionary of Dictionaries (Record-Oriented by Key)​

Deeply Nested JSON with Custom Path Extraction​

A Reusable Conversion Function​

Converting a DataFrame Back to JSON​

Quick Reference​

Conclusion​

Table of Contents

Simple Flat JSON with `pd.read_json()`

Reading from a JSON File

Reading from an API Response

Handling Nested JSON with `pd.json_normalize()`

Extracting Nested Arrays with `record_path`

Handling Multiple Nesting Levels

JSON Lines Format with `lines=True`

Streaming Large JSON Lines Files in Chunks

Handling Different JSON Structures

Dictionary of Lists (Column-Oriented)

Dictionary of Dictionaries (Record-Oriented by Key)

Deeply Nested JSON with Custom Path Extraction

A Reusable Conversion Function

Converting a DataFrame Back to JSON

Quick Reference

Conclusion