Python Pandas: How to Read JSON Files with Pandas

JSON (JavaScript Object Notation) is one of the most common data formats used in web APIs, configuration files, and data exchange between services. When working with data analysis in Python, you'll frequently need to load JSON data into a Pandas DataFrame for cleaning, exploration, and visualization.

In this guide, you'll learn multiple methods to read JSON files and JSON data into Pandas DataFrames, handle nested JSON structures, and work with real-world JSON formats.

Method 1: Using `pd.read_json()` (Simplest Approach)

The pd.read_json() function is the most straightforward way to read a JSON file directly into a DataFrame. It works with both local files and remote URLs.

Reading a Local JSON File

Suppose you have a file named employees.json:

[
    {"id": 1, "name": "Alice", "age": 25, "city": "New York"},
    {"id": 2, "name": "Bob", "age": 30, "city": "Chicago"},
    {"id": 3, "name": "Charlie", "age": 22, "city": "Boston"}
]

import pandas as pd

df = pd.read_json('employees.json')
print(df)

Output:

   id     name  age      city
 1    Alice   25  New York
 2      Bob   30   Chicago
 3  Charlie   22    Boston

Reading JSON from a URL

You can also pass a URL directly to pd.read_json(), which is useful when working with web APIs:

import pandas as pd

url = 'https://api.example.com/data.json'
df = pd.read_json(url)
print(df.head())

Reading a JSON String

If your JSON data is already in a Python string, use pd.read_json() with io.StringIO or pass the string directly:

import pandas as pd

json_string = '[{"name": "Alice", "score": 85}, {"name": "Bob", "score": 92}]'

df = pd.read_json(json_string)
print(df)

Output:

    name  score
0  Alice     85
1    Bob     92

tip

pd.read_json() automatically detects whether the input is a file path, URL, or JSON string. For most standard JSON files, this single function is all you need.

Method 2: Using `pd.json_normalize()` for Nested JSON

Real-world JSON data - especially from APIs - often contains nested structures (objects within objects). pd.json_normalize() flattens nested JSON into a tabular format suitable for DataFrames.

Flattening Nested JSON

import pandas as pd

data = [
    {
        "id": 1,
        "name": "Alice",
        "address": {
            "city": "New York",
            "zip": "10001"
        }
    },
    {
        "id": 2,
        "name": "Bob",
        "address": {
            "city": "Chicago",
            "zip": "60601"
        }
    }
]

df = pd.json_normalize(data)
print(df)

Output:

   id   name address.city address.zip
0   1  Alice     New York       10001
1   2    Bob      Chicago       60601

Notice how the nested address object is flattened into address.city and address.zip columns with dot notation.

Handling Deeply Nested JSON

For more complex structures, you can control the flattening depth with the max_level parameter:

import pandas as pd

data = [
    {
        "id": 1,
        "name": "Alice",
        "address": {
            "city": "New York",
            "location": {
                "lat": 40.7128,
                "lng": -74.0060
            }
        }
    }
]

# Flatten all levels
df_full = pd.json_normalize(data)
print("Fully flattened:")
print(df_full)

# Flatten only one level deep
df_shallow = pd.json_normalize(data, max_level=1)
print("\nShallow (max_level=1):")
print(df_shallow)

Output:

Fully flattened:
   id   name address.city  address.location.lat  address.location.lng
0   1  Alice     New York               40.7128               -74.006

Shallow (max_level=1):
   id   name address.city                  address.location
0   1  Alice     New York  {'lat': 40.7128, 'lng': -74.006}

Extracting Nested Lists with `record_path`

When JSON contains arrays of records nested inside objects, use record_path to specify which array to extract and meta to include parent-level fields:

import pandas as pd

data = [
    {
        "department": "Engineering",
        "employees": [
            {"name": "Alice", "role": "Developer"},
            {"name": "Bob", "role": "Designer"}
        ]
    },
    {
        "department": "Marketing",
        "employees": [
            {"name": "Charlie", "role": "Manager"}
        ]
    }
]

df = pd.json_normalize(
    data,
    record_path='employees',
    meta=['department']
)
print(df)

Output:

      name       role   department
  Alice  Developer  Engineering
    Bob   Designer  Engineering
Charlie    Manager    Marketing

When to use json_normalize() vs read_json()

pd.read_json(): Best for flat JSON (arrays of simple objects) stored in files or URLs.
pd.json_normalize(): Best for nested JSON that needs to be flattened, especially API responses with objects-within-objects or arrays-within-objects.

Method 3: Using the `json` Module with `pd.DataFrame()`

When you need to preprocess or transform JSON data before creating a DataFrame, load it with Python's built-in json module first:

From a JSON File

import json
import pandas as pd

with open('employees.json', 'r') as f:
    data = json.load(f)

df = pd.DataFrame(data)
print(df)

Output:

   id     name  age      city
 1    Alice   25  New York
 2      Bob   30   Chicago
 3  Charlie   22    Boston

From a Dictionary

If your data is already a Python dictionary, pass it directly to pd.DataFrame():

import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 22],
    "City": ["New York", "Chicago", "Boston"]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age      City
  Alice   25  New York
    Bob   30   Chicago
Charlie   22    Boston

Column-Oriented JSON

Some JSON files store data in a column-oriented format (keys are column names, values are dictionaries of index-to-value):

import pandas as pd
import json

data = {
    "Score_A": {"0": 60, "1": 75, "2": 80},
    "Score_B": {"0": 110, "1": 117, "2": 103}
}

df = pd.DataFrame(data)
print(df)

Output:

   Score_A  Score_B
     60      110
     75      117
     80      103

Handling Different JSON Orientations

JSON data can be structured in different orientations. The orient parameter in pd.read_json() tells Pandas how to interpret the structure:

Orientation	Structure	Example
`'records'`	List of row objects	`[{"a": 1, "b": 2}, {"a": 3, "b": 4}]`
`'columns'`	Dict of column arrays	`{"a": [1, 3], "b": [2, 4]}`
`'index'`	Dict of row objects	`{"0": {"a": 1}, "1": {"a": 3}}`
`'values'`	List of lists	`[[1, 2], [3, 4]]`
`'split'`	Dict with keys, data, index	`{"columns": ["a"], "data": [[1]]}`

import pandas as pd

# Records orientation (most common from APIs)
json_records = '[{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]'
df = pd.read_json(json_records, orient='records')
print(df)

Output:

    name  age
0  Alice   25
1    Bob   30

Common Mistake: Reading Non-Standard JSON

A frequent issue is trying to read a JSON file that contains one JSON object per line (JSON Lines / NDJL format) instead of a single JSON array:

Sample data.jsonl:

{"name": "Alice", "age": 25}
{"name": "Bob", "age": 30}
{"name": "Charlie", "age": 22}

import pandas as pd

# ❌ This may fail or produce unexpected results
try:
    df = pd.read_json('data.jsonl')
except ValueError as e:
    print(f"Error: {e}")

Output:

Error: Trailing data

Fix: Set lines=True to read JSON Lines format:

import pandas as pd

# ✅ Correct: use lines=True for JSON Lines format
df = pd.read_json('data.jsonl', lines=True)
print(df)

Output:

      name  age
  Alice   25
    Bob   30
Charlie   22

warning

Always check whether your JSON file is a single JSON array/object or a JSON Lines file (one JSON object per line). Using the wrong mode will cause parsing errors or corrupted data.

Summary

Method	Best For	Handles Nesting
`pd.read_json()`	Flat JSON files, URLs, strings	❌ (basic only)
`pd.json_normalize()`	Nested/complex JSON, API responses	✅ (flattens automatically)
`json.load()` + `pd.DataFrame()`	Custom preprocessing before loading	Manual handling

To read JSON files with Pandas:

Use pd.read_json() for simple, flat JSON files - it's a one-liner that handles files, URLs, and strings.
Use pd.json_normalize() for nested JSON data - it flattens hierarchical structures into clean tabular format.
Use json.load() with pd.DataFrame() when you need to transform or filter data before creating the DataFrame.
Set lines=True when reading JSON Lines (.jsonl) files, and use the orient parameter when your JSON uses a non-default structure.

Method 1: Using pd.read_json() (Simplest Approach)​

Reading a Local JSON File​

Reading JSON from a URL​

Reading a JSON String​

Method 2: Using pd.json_normalize() for Nested JSON​

Flattening Nested JSON​

Handling Deeply Nested JSON​

Extracting Nested Lists with record_path​

Method 3: Using the json Module with pd.DataFrame()​

From a JSON File​

From a Dictionary​

Column-Oriented JSON​

Handling Different JSON Orientations​

Common Mistake: Reading Non-Standard JSON​

Summary​

Table of Contents

Method 1: Using `pd.read_json()` (Simplest Approach)

Reading a Local JSON File

Reading JSON from a URL

Reading a JSON String

Method 2: Using `pd.json_normalize()` for Nested JSON

Flattening Nested JSON

Handling Deeply Nested JSON

Extracting Nested Lists with `record_path`

Method 3: Using the `json` Module with `pd.DataFrame()`

From a JSON File

From a Dictionary

Column-Oriented JSON

Handling Different JSON Orientations

Common Mistake: Reading Non-Standard JSON

Summary