Skip to main content

Python Pandas: How to Read JSON Files with Pandas

JSON (JavaScript Object Notation) is one of the most common data formats used in web APIs, configuration files, and data exchange between services. When working with data analysis in Python, you'll frequently need to load JSON data into a Pandas DataFrame for cleaning, exploration, and visualization.

In this guide, you'll learn multiple methods to read JSON files and JSON data into Pandas DataFrames, handle nested JSON structures, and work with real-world JSON formats.

Method 1: Using pd.read_json() (Simplest Approach)

The pd.read_json() function is the most straightforward way to read a JSON file directly into a DataFrame. It works with both local files and remote URLs.

Reading a Local JSON File

Suppose you have a file named employees.json:

[
{"id": 1, "name": "Alice", "age": 25, "city": "New York"},
{"id": 2, "name": "Bob", "age": 30, "city": "Chicago"},
{"id": 3, "name": "Charlie", "age": 22, "city": "Boston"}
]
import pandas as pd

df = pd.read_json('employees.json')
print(df)

Output:

   id     name  age      city
0 1 Alice 25 New York
1 2 Bob 30 Chicago
2 3 Charlie 22 Boston

Reading JSON from a URL

You can also pass a URL directly to pd.read_json(), which is useful when working with web APIs:

import pandas as pd

url = 'https://api.example.com/data.json'
df = pd.read_json(url)
print(df.head())

Reading a JSON String

If your JSON data is already in a Python string, use pd.read_json() with io.StringIO or pass the string directly:

import pandas as pd

json_string = '[{"name": "Alice", "score": 85}, {"name": "Bob", "score": 92}]'

df = pd.read_json(json_string)
print(df)

Output:

    name  score
0 Alice 85
1 Bob 92
tip

pd.read_json() automatically detects whether the input is a file path, URL, or JSON string. For most standard JSON files, this single function is all you need.

Method 2: Using pd.json_normalize() for Nested JSON

Real-world JSON data - especially from APIs - often contains nested structures (objects within objects). pd.json_normalize() flattens nested JSON into a tabular format suitable for DataFrames.

Flattening Nested JSON

import pandas as pd

data = [
{
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"zip": "10001"
}
},
{
"id": 2,
"name": "Bob",
"address": {
"city": "Chicago",
"zip": "60601"
}
}
]

df = pd.json_normalize(data)
print(df)

Output:

   id   name address.city address.zip
0 1 Alice New York 10001
1 2 Bob Chicago 60601

Notice how the nested address object is flattened into address.city and address.zip columns with dot notation.

Handling Deeply Nested JSON

For more complex structures, you can control the flattening depth with the max_level parameter:

import pandas as pd

data = [
{
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"location": {
"lat": 40.7128,
"lng": -74.0060
}
}
}
]

# Flatten all levels
df_full = pd.json_normalize(data)
print("Fully flattened:")
print(df_full)

# Flatten only one level deep
df_shallow = pd.json_normalize(data, max_level=1)
print("\nShallow (max_level=1):")
print(df_shallow)

Output:

Fully flattened:
id name address.city address.location.lat address.location.lng
0 1 Alice New York 40.7128 -74.006

Shallow (max_level=1):
id name address.city address.location
0 1 Alice New York {'lat': 40.7128, 'lng': -74.006}

Extracting Nested Lists with record_path

When JSON contains arrays of records nested inside objects, use record_path to specify which array to extract and meta to include parent-level fields:

import pandas as pd

data = [
{
"department": "Engineering",
"employees": [
{"name": "Alice", "role": "Developer"},
{"name": "Bob", "role": "Designer"}
]
},
{
"department": "Marketing",
"employees": [
{"name": "Charlie", "role": "Manager"}
]
}
]

df = pd.json_normalize(
data,
record_path='employees',
meta=['department']
)
print(df)

Output:

      name       role   department
0 Alice Developer Engineering
1 Bob Designer Engineering
2 Charlie Manager Marketing
When to use json_normalize() vs read_json()
  • pd.read_json(): Best for flat JSON (arrays of simple objects) stored in files or URLs.
  • pd.json_normalize(): Best for nested JSON that needs to be flattened, especially API responses with objects-within-objects or arrays-within-objects.

Method 3: Using the json Module with pd.DataFrame()

When you need to preprocess or transform JSON data before creating a DataFrame, load it with Python's built-in json module first:

From a JSON File

import json
import pandas as pd

with open('employees.json', 'r') as f:
data = json.load(f)

df = pd.DataFrame(data)
print(df)

Output:

   id     name  age      city
0 1 Alice 25 New York
1 2 Bob 30 Chicago
2 3 Charlie 22 Boston

From a Dictionary

If your data is already a Python dictionary, pass it directly to pd.DataFrame():

import pandas as pd

data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 22],
"City": ["New York", "Chicago", "Boston"]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age      City
0 Alice 25 New York
1 Bob 30 Chicago
2 Charlie 22 Boston

Column-Oriented JSON

Some JSON files store data in a column-oriented format (keys are column names, values are dictionaries of index-to-value):

import pandas as pd
import json

data = {
"Score_A": {"0": 60, "1": 75, "2": 80},
"Score_B": {"0": 110, "1": 117, "2": 103}
}

df = pd.DataFrame(data)
print(df)

Output:

   Score_A  Score_B
0 60 110
1 75 117
2 80 103

Handling Different JSON Orientations

JSON data can be structured in different orientations. The orient parameter in pd.read_json() tells Pandas how to interpret the structure:

OrientationStructureExample
'records'List of row objects[{"a": 1, "b": 2}, {"a": 3, "b": 4}]
'columns'Dict of column arrays{"a": [1, 3], "b": [2, 4]}
'index'Dict of row objects{"0": {"a": 1}, "1": {"a": 3}}
'values'List of lists[[1, 2], [3, 4]]
'split'Dict with keys, data, index{"columns": ["a"], "data": [[1]]}
import pandas as pd

# Records orientation (most common from APIs)
json_records = '[{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]'
df = pd.read_json(json_records, orient='records')
print(df)

Output:

    name  age
0 Alice 25
1 Bob 30

Common Mistake: Reading Non-Standard JSON

A frequent issue is trying to read a JSON file that contains one JSON object per line (JSON Lines / NDJL format) instead of a single JSON array:

Sample data.jsonl:

{"name": "Alice", "age": 25}
{"name": "Bob", "age": 30}
{"name": "Charlie", "age": 22}
import pandas as pd

# ❌ This may fail or produce unexpected results
try:
df = pd.read_json('data.jsonl')
except ValueError as e:
print(f"Error: {e}")

Output:

Error: Trailing data

Fix: Set lines=True to read JSON Lines format:

import pandas as pd

# ✅ Correct: use lines=True for JSON Lines format
df = pd.read_json('data.jsonl', lines=True)
print(df)

Output:

      name  age
0 Alice 25
1 Bob 30
2 Charlie 22
caution

Always check whether your JSON file is a single JSON array/object or a JSON Lines file (one JSON object per line). Using the wrong mode will cause parsing errors or corrupted data.

Summary

MethodBest ForHandles Nesting
pd.read_json()Flat JSON files, URLs, strings❌ (basic only)
pd.json_normalize()Nested/complex JSON, API responses✅ (flattens automatically)
json.load() + pd.DataFrame()Custom preprocessing before loadingManual handling

To read JSON files with Pandas:

  • Use pd.read_json() for simple, flat JSON files - it's a one-liner that handles files, URLs, and strings.
  • Use pd.json_normalize() for nested JSON data - it flattens hierarchical structures into clean tabular format.
  • Use json.load() with pd.DataFrame() when you need to transform or filter data before creating the DataFrame.
  • Set lines=True when reading JSON Lines (.jsonl) files, and use the orient parameter when your JSON uses a non-default structure.