How to Convert JSON to CSV in Python
Converting JSON data to CSV format is a common data engineering task that arises when you need to move data between web APIs and spreadsheet tools, prepare datasets for analysis, or feed information into systems that expect tabular input. While JSON supports deeply nested and hierarchical structures, CSV is inherently flat, which means the conversion often requires careful flattening of nested objects and arrays.
In this guide, you will learn how to convert JSON to CSV using both Python's built-in csv module and the Pandas library, handle nested structures, manage large files efficiently, and deal with real-world edge cases like missing keys and inconsistent records.
Using Pandas (Recommended for Complex Data)
Pandas provides the most straightforward and powerful approach for most JSON-to-CSV conversions. Install it if you have not already:
pip install pandas
Basic Conversion
For simple, flat JSON data, the conversion is just two lines:
import pandas as pd
# From a JSON file
df = pd.read_json("data.json")
df.to_csv("output.csv", index=False)
# From a JSON string
json_string = '[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]'
df = pd.read_json(json_string)
df.to_csv("output.csv", index=False)
print(df)
Output:
id name
0 1 Alice
1 2 Bob
The index=False parameter prevents Pandas from writing its internal row index as an extra column in the CSV file.
pd.read_json()In newer versions of pandas, passing a raw JSON string directly to pd.read_json() is deprecated and will be removed in a future release.
# ⚠️ Deprecated usage
df = pd.read_json(json_string)
This produces:
FutureWarning: Passing literal json to 'read_json' is deprecated...
The correct approach is to wrap the JSON string in a StringIO object:
import pandas as pd
from io import StringIO
json_string = '[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]'
df = pd.read_json(StringIO(json_string))
df.to_csv("output.csv", index=False)
Using StringIO ensures forward compatibility with future pandas versions and prevents the deprecation warning.
Handling Nested JSON with json_normalize()
Real-world JSON frequently contains nested objects. The json_normalize() function automatically flattens them into dot-separated column names:
import pandas as pd
from pandas import json_normalize
data = [
{
"id": 1,
"name": "Alice",
"address": {
"city": "New York",
"zip": "10001"
},
"orders": [{"item": "Book", "price": 15}]
},
{
"id": 2,
"name": "Bob",
"address": {
"city": "Los Angeles",
"zip": "90001"
},
"orders": [{"item": "Laptop", "price": 999}]
}
]
# Flatten nested "address" into separate columns
df = json_normalize(data)
print(df.columns.tolist())
df.to_csv("users.csv", index=False)
Output:
['id', 'name', 'orders', 'address.city', 'address.zip']
Notice how address.city and address.zip are extracted into their own columns automatically, while the orders array remains as a list because it contains multiple potential rows of data.
Flattening Arrays with record_path
When your JSON contains arrays that you want to expand into individual rows, use the record_path and meta parameters:
import pandas as pd
from pandas import json_normalize
data = {
"company": "TechCorp",
"employees": [
{"name": "Alice", "skills": ["Python", "SQL"]},
{"name": "Bob", "skills": ["Java", "AWS"]}
]
}
# Flatten the employees array, keeping the parent company field
df = json_normalize(
data,
record_path="employees",
meta=["company"]
)
print(df)
df.to_csv("employees.csv", index=False)
Output:
name skills company
0 Alice [Python, SQL] TechCorp
1 Bob [Java, AWS] TechCorp
If a column contains lists (like skills above), CSV cannot natively represent them. Convert lists to comma-separated strings before exporting:
df['skills'] = df['skills'].apply(
lambda x: ', '.join(x) if isinstance(x, list) else x
)
print(df)
Output:
name skills company
0 Alice Python, SQL TechCorp
1 Bob Java, AWS TechCorp
Using the csv Module (No Dependencies)
For lightweight scripts or environments where installing Pandas is not an option, Python's built-in csv and json modules handle the job without any external dependencies.
Converting Simple Flat JSON
import json
import csv
# Load JSON data
with open('users.json', 'r') as f:
data = json.load(f)
# Write to CSV
with open('users.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
This approach works well when every record has the same keys and no nested structures.
Flattening Nested JSON Manually
When dealing with nested objects without Pandas, you need a recursive flattening function:
import json
import csv
def flatten_dict(d, parent_key='', sep='.'):
"""Recursively flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, sep).items())
elif isinstance(v, list):
items.append((new_key, json.dumps(v)))
else:
items.append((new_key, v))
return dict(items)
# Example data with nested objects
data = [
{"id": 1, "info": {"name": "Alice", "city": "NY"}},
{"id": 2, "info": {"name": "Bob", "city": "LA"}}
]
# Flatten all records
flat_data = [flatten_dict(record) for record in data]
# Collect all possible keys across all records
all_keys = set()
for record in flat_data:
all_keys.update(record.keys())
# Write CSV with sorted column names for consistency
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=sorted(all_keys))
writer.writeheader()
writer.writerows(flat_data)
print(f"Columns: {sorted(all_keys)}")
Output:
Columns: ['id', 'info.city', 'info.name']
Collecting all keys from every record before writing ensures that records with different fields are handled correctly. DictWriter fills missing fields with empty strings by default.
Handling Large JSON Files
Streaming with ijson for Memory Efficiency
When your JSON file is too large to fit in memory, the ijson library parses it incrementally:
pip install ijson
import ijson
import csv
def stream_json_to_csv(json_path, csv_path):
"""Convert a large JSON array to CSV without loading the entire file."""
with open(json_path, 'rb') as json_file, \
open(csv_path, 'w', newline='', encoding='utf-8') as csv_file:
items = ijson.items(json_file, 'item')
writer = None
for item in items:
if writer is None:
writer = csv.DictWriter(csv_file, fieldnames=item.keys())
writer.writeheader()
writer.writerow(item)
stream_json_to_csv('large_data.json', 'output.csv')
This approach processes one record at a time, keeping memory usage constant regardless of file size.
Chunked Processing with Pandas
If you need Pandas features but want to limit memory usage, process the data in chunks:
import pandas as pd
import json
def json_to_csv_chunked(json_path, csv_path, chunk_size=10000):
"""Process large JSON data in chunks to limit memory usage."""
with open(json_path, 'r') as f:
data = json.load(f)
for i, start in enumerate(range(0, len(data), chunk_size)):
chunk = data[start:start + chunk_size]
df = pd.json_normalize(chunk)
df.to_csv(
csv_path,
mode='a' if i > 0 else 'w',
header=(i == 0),
index=False
)
print(f"Converted {len(data)} records in {i + 1} chunks")
The chunked Pandas approach still loads the entire JSON into memory during json.load(). For truly massive files that cannot fit in memory at all, use the ijson streaming approach instead.
Handling Special Cases
Missing Keys Across Records
JSON records do not always share the same set of keys. Pandas handles this gracefully by filling missing values with NaN:
import pandas as pd
from pandas import json_normalize
data = [
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob"},
{"id": 3, "name": "Charlie", "phone": "555-1234"}
]
df = json_normalize(data)
print(df.to_string(index=False))
df.to_csv("users.csv", index=False)
Output:
id name email phone
1 Alice alice@example.com NaN
2 Bob NaN NaN
3 Charlie NaN 555-1234
Selecting and Renaming Columns
You often need to export only specific columns or rename them for downstream consumers:
import pandas as pd
data = [
{"id": 1, "name": "Alice", "email": "alice@example.com", "internal_code": "A1"},
{"id": 2, "name": "Bob", "email": "bob@example.com", "internal_code": "B2"}
]
df = pd.DataFrame(data)
# Select only the columns you need
columns_to_export = ['id', 'name', 'email']
df_subset = df[columns_to_export]
# Rename columns for the CSV output
df_subset.columns = ['ID', 'Full Name', 'Email Address']
df_subset.to_csv("custom_output.csv", index=False)
print(df_subset.to_string(index=False))
Output:
ID Full Name Email Address
1 Alice alice@example.com
2 Bob bob@example.com
Complete Reusable Conversion Function
Here is a flexible function that handles the most common conversion scenarios:
import pandas as pd
from pandas import json_normalize
import json
from pathlib import Path
def json_to_csv(
input_path,
output_path,
flatten=True,
columns=None,
encoding='utf-8'
):
"""
Convert a JSON file to CSV with configurable options.
Args:
input_path: Path to the input JSON file.
output_path: Path for the output CSV file.
flatten: Whether to flatten nested objects into dot-separated columns.
columns: List of column names to include. None includes all columns.
encoding: Character encoding for the output file.
Returns:
The number of records written.
"""
with open(input_path, 'r', encoding=encoding) as f:
data = json.load(f)
# Handle both a single object and an array of objects
if isinstance(data, dict):
data = [data]
# Create the DataFrame with optional flattening
if flatten:
df = json_normalize(data)
else:
df = pd.DataFrame(data)
# Filter to requested columns if specified
if columns:
available = [c for c in columns if c in df.columns]
df = df[available]
# Write to CSV
df.to_csv(output_path, index=False, encoding=encoding)
return len(df)
# Usage
rows = json_to_csv("data.json", "output.csv", flatten=True)
print(f"Converted {rows} records")
Method Comparison
| Feature | Pandas | csv Module |
|---|---|---|
| Dependencies | Requires pandas | Built-in, no installation needed |
| Nested data | json_normalize() handles automatically | Requires manual flattening function |
| Performance | C-optimized, fast for large datasets | Slower for large volumes of data |
| Memory | Loads all data into RAM | Can stream record by record |
| Missing keys | Fills with NaN automatically | Requires manual handling |
| Best for | Complex structures, analysis workflows | Simple data, minimal environments |
Conclusion
Converting JSON to CSV in Python is straightforward for flat data but requires more care when dealing with nested structures, inconsistent schemas, or large files. For most use cases, Pandas with json_normalize() provides the best combination of power and simplicity, automatically flattening nested objects and handling missing keys. When you need zero dependencies or are working in a constrained environment, the built-in csv module paired with a recursive flattening function gets the job done. For files too large to fit in memory, streaming with ijson keeps resource usage under control.
Use Pandas as your default approach for JSON-to-CSV conversion, especially when working with nested data. Reserve the built-in csv module for simple, flat JSON in environments where external dependencies are not an option. Always inspect your JSON structure before writing the conversion code so you can choose the right flattening strategy from the start.