Python Pandas: How to Read JSON Files with Pandas
JSON (JavaScript Object Notation) is one of the most common data formats used in web APIs, configuration files, and data exchange between services. When working with data analysis in Python, you'll frequently need to load JSON data into a Pandas DataFrame for cleaning, exploration, and visualization.
In this guide, you'll learn multiple methods to read JSON files and JSON data into Pandas DataFrames, handle nested JSON structures, and work with real-world JSON formats.
Method 1: Using pd.read_json() (Simplest Approach)
The pd.read_json() function is the most straightforward way to read a JSON file directly into a DataFrame. It works with both local files and remote URLs.
Reading a Local JSON File
Suppose you have a file named employees.json:
[
{"id": 1, "name": "Alice", "age": 25, "city": "New York"},
{"id": 2, "name": "Bob", "age": 30, "city": "Chicago"},
{"id": 3, "name": "Charlie", "age": 22, "city": "Boston"}
]
import pandas as pd
df = pd.read_json('employees.json')
print(df)
Output:
id name age city
0 1 Alice 25 New York
1 2 Bob 30 Chicago
2 3 Charlie 22 Boston
Reading JSON from a URL
You can also pass a URL directly to pd.read_json(), which is useful when working with web APIs:
import pandas as pd
url = 'https://api.example.com/data.json'
df = pd.read_json(url)
print(df.head())
Reading a JSON String
If your JSON data is already in a Python string, use pd.read_json() with io.StringIO or pass the string directly:
import pandas as pd
json_string = '[{"name": "Alice", "score": 85}, {"name": "Bob", "score": 92}]'
df = pd.read_json(json_string)
print(df)
Output:
name score
0 Alice 85
1 Bob 92
pd.read_json() automatically detects whether the input is a file path, URL, or JSON string. For most standard JSON files, this single function is all you need.
Method 2: Using pd.json_normalize() for Nested JSON
Real-world JSON data - especially from APIs - often contains nested structures (objects within objects). pd.json_normalize() flattens nested JSON into a tabular format suitable for DataFrames.
Flattening Nested JSON
import pandas as pd
data = [
{
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"zip": "10001"
}
},
{
"id": 2,
"name": "Bob",
"address": {
"city": "Chicago",
"zip": "60601"
}
}
]
df = pd.json_normalize(data)
print(df)
Output:
id name address.city address.zip
0 1 Alice New York 10001
1 2 Bob Chicago 60601
Notice how the nested address object is flattened into address.city and address.zip columns with dot notation.
Handling Deeply Nested JSON
For more complex structures, you can control the flattening depth with the max_level parameter:
import pandas as pd
data = [
{
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"location": {
"lat": 40.7128,
"lng": -74.0060
}
}
}
]
# Flatten all levels
df_full = pd.json_normalize(data)
print("Fully flattened:")
print(df_full)
# Flatten only one level deep
df_shallow = pd.json_normalize(data, max_level=1)
print("\nShallow (max_level=1):")
print(df_shallow)
Output:
Fully flattened:
id name address.city address.location.lat address.location.lng
0 1 Alice New York 40.7128 -74.006
Shallow (max_level=1):
id name address.city address.location
0 1 Alice New York {'lat': 40.7128, 'lng': -74.006}
Extracting Nested Lists with record_path
When JSON contains arrays of records nested inside objects, use record_path to specify which array to extract and meta to include parent-level fields:
import pandas as pd
data = [
{
"department": "Engineering",
"employees": [
{"name": "Alice", "role": "Developer"},
{"name": "Bob", "role": "Designer"}
]
},
{
"department": "Marketing",
"employees": [
{"name": "Charlie", "role": "Manager"}
]
}
]
df = pd.json_normalize(
data,
record_path='employees',
meta=['department']
)
print(df)
Output:
name role department
0 Alice Developer Engineering
1 Bob Designer Engineering
2 Charlie Manager Marketing
json_normalize() vs read_json()pd.read_json(): Best for flat JSON (arrays of simple objects) stored in files or URLs.pd.json_normalize(): Best for nested JSON that needs to be flattened, especially API responses with objects-within-objects or arrays-within-objects.
Method 3: Using the json Module with pd.DataFrame()
When you need to preprocess or transform JSON data before creating a DataFrame, load it with Python's built-in json module first:
From a JSON File
import json
import pandas as pd
with open('employees.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
print(df)
Output:
id name age city
0 1 Alice 25 New York
1 2 Bob 30 Chicago
2 3 Charlie 22 Boston
From a Dictionary
If your data is already a Python dictionary, pass it directly to pd.DataFrame():
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 22],
"City": ["New York", "Chicago", "Boston"]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Chicago
2 Charlie 22 Boston
Column-Oriented JSON
Some JSON files store data in a column-oriented format (keys are column names, values are dictionaries of index-to-value):
import pandas as pd
import json
data = {
"Score_A": {"0": 60, "1": 75, "2": 80},
"Score_B": {"0": 110, "1": 117, "2": 103}
}
df = pd.DataFrame(data)
print(df)
Output:
Score_A Score_B
0 60 110
1 75 117
2 80 103
Handling Different JSON Orientations
JSON data can be structured in different orientations. The orient parameter in pd.read_json() tells Pandas how to interpret the structure:
| Orientation | Structure | Example |
|---|---|---|
'records' | List of row objects | [{"a": 1, "b": 2}, {"a": 3, "b": 4}] |
'columns' | Dict of column arrays | {"a": [1, 3], "b": [2, 4]} |
'index' | Dict of row objects | {"0": {"a": 1}, "1": {"a": 3}} |
'values' | List of lists | [[1, 2], [3, 4]] |
'split' | Dict with keys, data, index | {"columns": ["a"], "data": [[1]]} |
import pandas as pd
# Records orientation (most common from APIs)
json_records = '[{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]'
df = pd.read_json(json_records, orient='records')
print(df)
Output:
name age
0 Alice 25
1 Bob 30
Common Mistake: Reading Non-Standard JSON
A frequent issue is trying to read a JSON file that contains one JSON object per line (JSON Lines / NDJL format) instead of a single JSON array:
Sample data.jsonl:
{"name": "Alice", "age": 25}
{"name": "Bob", "age": 30}
{"name": "Charlie", "age": 22}
import pandas as pd
# ❌ This may fail or produce unexpected results
try:
df = pd.read_json('data.jsonl')
except ValueError as e:
print(f"Error: {e}")
Output:
Error: Trailing data
Fix: Set lines=True to read JSON Lines format:
import pandas as pd
# ✅ Correct: use lines=True for JSON Lines format
df = pd.read_json('data.jsonl', lines=True)
print(df)
Output:
name age
0 Alice 25
1 Bob 30
2 Charlie 22
Always check whether your JSON file is a single JSON array/object or a JSON Lines file (one JSON object per line). Using the wrong mode will cause parsing errors or corrupted data.
Summary
| Method | Best For | Handles Nesting |
|---|---|---|
pd.read_json() | Flat JSON files, URLs, strings | ❌ (basic only) |
pd.json_normalize() | Nested/complex JSON, API responses | ✅ (flattens automatically) |
json.load() + pd.DataFrame() | Custom preprocessing before loading | Manual handling |
To read JSON files with Pandas:
- Use
pd.read_json()for simple, flat JSON files - it's a one-liner that handles files, URLs, and strings. - Use
pd.json_normalize()for nested JSON data - it flattens hierarchical structures into clean tabular format. - Use
json.load()withpd.DataFrame()when you need to transform or filter data before creating the DataFrame. - Set
lines=Truewhen reading JSON Lines (.jsonl) files, and use theorientparameter when your JSON uses a non-default structure.