Skip to main content

Python Pandas: How to Load a JSON String Into a Pandas DataFrame in Python

JSON (JavaScript Object Notation) is one of the most common data interchange formats used in web APIs, configuration files, and data storage. When working with data analysis in Python, you frequently need to convert JSON data - whether from a file, a string, or an API response - into a Pandas DataFrame for efficient manipulation and analysis.

Pandas provides built-in functions like read_json() and json_normalize() that make this conversion straightforward.

In this guide, you will learn how to load JSON strings and files into DataFrames, handle different JSON orientations, and work with nested JSON structures.

Loading a JSON String Directly Into a DataFrame

The most common scenario is converting a JSON-formatted string into a DataFrame using pd.read_json():

import pandas as pd

json_string = '''
[
{"Name": "Alice", "Age": 30, "City": "New York"},
{"Name": "Bob", "Age": 25, "City": "Chicago"},
{"Name": "Charlie", "Age": 35, "City": "Houston"}
]
'''

df = pd.read_json(json_string)
print(df)

Output:

      Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago
2 Charlie 35 Houston

The function automatically detects the JSON structure and maps keys to column names and values to rows.

Loading a JSON File Into a DataFrame

To load JSON data from a file, pass the file path directly to pd.read_json():

import pandas as pd

df = pd.read_json("data.json")
print(df)

This reads the entire file, parses the JSON content, and returns a DataFrame. No manual file opening or parsing is required.

tip

pd.read_json() also accepts URLs, so you can load JSON data directly from a web API:

df = pd.read_json("https://api.example.com/data.json")

Understanding JSON Orientations

JSON data can be structured in several different ways, and Pandas needs to know the orientation to parse it correctly. The orient parameter in read_json() controls how the JSON structure is interpreted.

Records Orientation (Array of Objects)

This is the most common format - an array where each element is an object representing a row:

[
{"Name": "Alice", "Age": 30, "City": "New York"},
{"Name": "Bob", "Age": 25, "City": "Chicago"}
]
import pandas as pd

json_string = '[{"Name": "Alice", "Age": 30}, {"Name": "Bob", "Age": 25}]'
df = pd.read_json(json_string, orient="records")
print(df)

Output:

    Name  Age
0 Alice 30
1 Bob 25

Index Orientation

Each top-level key is a row index, and its value is an object of column-value pairs:

{
"0": {"Name": "Alice", "Age": 30},
"1": {"Name": "Bob", "Age": 25}
}
import pandas as pd

json_string = '{"0": {"Name": "Alice", "Age": 30}, "1": {"Name": "Bob", "Age": 25}}'
df = pd.read_json(json_string, orient="index")
print(df)

Output:

    Name  Age
0 Alice 30
1 Bob 25

Column Orientation

Each top-level key is a column name, and its value is an object mapping row indices to values:

{
"Name": {"0": "Alice", "1": "Bob"},
"Age": {"0": 30, "1": 25}
}
import pandas as pd

json_string = '{"Name": {"0": "Alice", "1": "Bob"}, "Age": {"0": 30, "1": 25}}'
df = pd.read_json(json_string, orient="columns")
print(df)

Output:

    Name  Age
0 Alice 30
1 Bob 25

Values Orientation

A simple 2D array with no column names or indices:

[
["Alice", 30, "New York"],
["Bob", 25, "Chicago"]
]
import pandas as pd

json_string = '[["Alice", 30, "New York"], ["Bob", 25, "Chicago"]]'
df = pd.read_json(json_string, orient="values")
df.columns = ["Name", "Age", "City"] # Assign column names manually
print(df)

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago
When orient is not specified, Pandas attempts to auto-detect the orientation. For ambiguous JSON structures, explicitly setting orient avoids unexpected results.

Quick Reference: JSON Orientations

OrientationStructureAuto-Detected?
records[{col: val, ...}, ...]Yes
index{index: {col: val, ...}, ...}Yes
columns{col: {index: val, ...}, ...}Yes (default)
values[[val, val, ...], ...]No - must specify
split{"index": [...], "columns": [...], "data": [...]}No - must specify

Loading JSON From a Python Dictionary

If your data is already a Python dictionary (not a JSON string), use pd.DataFrame() directly or convert it first:

import pandas as pd

data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [30, 25, 35],
"City": ["New York", "Chicago", "Houston"]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago
2 Charlie 35 Houston

For a list of dictionaries (records format):

import pandas as pd

records = [
{"Name": "Alice", "Age": 30},
{"Name": "Bob", "Age": 25},
{"Name": "Charlie", "Age": 35}
]

df = pd.DataFrame(records)
print(df)

Output:

      Name  Age
0 Alice 30
1 Bob 25
2 Charlie 35

Handling Nested JSON with json_normalize()

Real-world JSON data is often nested - objects contain other objects or arrays. pd.read_json() does not flatten nested structures automatically. Use pd.json_normalize() instead:

import pandas as pd

data = [
{
"Name": "Alice",
"Age": 30,
"Address": {
"City": "New York",
"State": "NY"
}
},
{
"Name": "Bob",
"Age": 25,
"Address": {
"City": "Chicago",
"State": "IL"
}
}
]

df = pd.json_normalize(data)
print(df)

Output:

    Name  Age Address.City Address.State
0 Alice 30 New York NY
1 Bob 25 Chicago IL

The nested Address object is automatically flattened into Address.City and Address.State columns.

Common Mistakes and How to Fix Them

Mistake 1: Using the Wrong Variable Name

import pandas

# ❌ 'df' is not defined: 'pandas' is the module name
data = df.read_json("data.json")

Fix: Use the correct module reference:

import pandas as pd

# ✅ Correct
data = pd.read_json("data.json")

Mistake 2: Passing a Dict Instead of a JSON String

import pandas as pd

# ❌ This is a Python dict, not a JSON string
data = {"Name": ["Alice"], "Age": [30]}
df = pd.read_json(data)
# TypeError: Invalid file path or buffer object type: <class 'dict'>

Fix: Convert the dict to a JSON string first, or use pd.DataFrame():

import pandas as pd
import json

# ✅ Option 1: Convert to JSON string
data = {"Name": ["Alice"], "Age": [30]}
df = pd.read_json(json.dumps(data))

# ✅ Option 2: Use DataFrame directly
df = pd.DataFrame(data)

Mistake 3: Wrong Orientation for the Data

import pandas as pd

# This is records-oriented JSON
json_string = '[{"Name": "Alice"}, {"Name": "Bob"}]'

# ❌ Using 'index' orient on records data produces wrong results
df = pd.read_json(json_string, orient="index")

Fix: Match the orient parameter to your actual JSON structure:

# ✅ Correct orientation
df = pd.read_json(json_string, orient="records")
warning

When in doubt about the orientation, load your JSON string with Python's json module first to inspect its structure:

import json

data = json.loads(json_string)
print(type(data)) # list → likely 'records' or 'values'
# dict → likely 'columns', 'index', or 'split'

Using json Module With pd.DataFrame()

For maximum control, parse the JSON manually with Python's built-in json module and then create the DataFrame:

import json
import pandas as pd

json_string = '''
{
"employees": [
{"name": "Alice", "department": "Engineering"},
{"name": "Bob", "department": "Marketing"},
{"name": "Charlie", "department": "Sales"}
]
}
'''

parsed = json.loads(json_string)
df = pd.DataFrame(parsed["employees"])
print(df)

Output:

      name   department
0 Alice Engineering
1 Bob Marketing
2 Charlie Sales

This approach is useful when the data you need is nested inside a specific key of the JSON object.

Conclusion

Loading JSON data into a Pandas DataFrame is straightforward with the right tools.

Use pd.read_json() for JSON strings and files with standard orientations, pd.json_normalize() for nested JSON structures that need flattening, and pd.DataFrame() with the json module for maximum control over parsing.

Understanding JSON orientations (records, index, columns, values, and split) is key to ensuring your data loads correctly.

Always match the orient parameter to your JSON structure, and inspect unfamiliar JSON data before loading to avoid silent misinterpretation.