How to Convert JSON to CSV in Python

Converting JSON data to CSV format is a common data engineering task that arises when you need to move data between web APIs and spreadsheet tools, prepare datasets for analysis, or feed information into systems that expect tabular input. While JSON supports deeply nested and hierarchical structures, CSV is inherently flat, which means the conversion often requires careful flattening of nested objects and arrays.

In this guide, you will learn how to convert JSON to CSV using both Python's built-in csv module and the Pandas library, handle nested structures, manage large files efficiently, and deal with real-world edge cases like missing keys and inconsistent records.

Using Pandas (Recommended for Complex Data)

Pandas provides the most straightforward and powerful approach for most JSON-to-CSV conversions. Install it if you have not already:

pip install pandas

Basic Conversion

For simple, flat JSON data, the conversion is just two lines:

import pandas as pd

# From a JSON file
df = pd.read_json("data.json")
df.to_csv("output.csv", index=False)

# From a JSON string
json_string = '[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]'
df = pd.read_json(json_string)
df.to_csv("output.csv", index=False)

print(df)

Output:

   id   name
0   1  Alice
1   2    Bob

note

The index=False parameter prevents Pandas from writing its internal row index as an extra column in the CSV file.

Deprecation Warning: Passing Literal JSON to pd.read_json()

In newer versions of pandas, passing a raw JSON string directly to pd.read_json() is deprecated and will be removed in a future release.

# ⚠️ Deprecated usage
df = pd.read_json(json_string)

This produces:

FutureWarning: Passing literal json to 'read_json' is deprecated...

The correct approach is to wrap the JSON string in a StringIO object:

import pandas as pd
from io import StringIO

json_string = '[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]'

df = pd.read_json(StringIO(json_string))
df.to_csv("output.csv", index=False)

Using StringIO ensures forward compatibility with future pandas versions and prevents the deprecation warning.

Handling Nested JSON with `json_normalize()`

Real-world JSON frequently contains nested objects. The json_normalize() function automatically flattens them into dot-separated column names:

import pandas as pd
from pandas import json_normalize

data = [
    {
        "id": 1,
        "name": "Alice",
        "address": {
            "city": "New York",
            "zip": "10001"
        },
        "orders": [{"item": "Book", "price": 15}]
    },
    {
        "id": 2,
        "name": "Bob",
        "address": {
            "city": "Los Angeles",
            "zip": "90001"
        },
        "orders": [{"item": "Laptop", "price": 999}]
    }
]

# Flatten nested "address" into separate columns
df = json_normalize(data)
print(df.columns.tolist())

df.to_csv("users.csv", index=False)

Output:

['id', 'name', 'orders', 'address.city', 'address.zip']

note

Notice how address.city and address.zip are extracted into their own columns automatically, while the orders array remains as a list because it contains multiple potential rows of data.

Flattening Arrays with `record_path`

When your JSON contains arrays that you want to expand into individual rows, use the record_path and meta parameters:

import pandas as pd
from pandas import json_normalize

data = {
    "company": "TechCorp",
    "employees": [
        {"name": "Alice", "skills": ["Python", "SQL"]},
        {"name": "Bob", "skills": ["Java", "AWS"]}
    ]
}

# Flatten the employees array, keeping the parent company field
df = json_normalize(
    data,
    record_path="employees",
    meta=["company"]
)

print(df)
df.to_csv("employees.csv", index=False)

Output:

    name         skills   company
0  Alice  [Python, SQL]  TechCorp
1    Bob    [Java, AWS]  TechCorp

Converting Lists Within Columns to Strings

If a column contains lists (like skills above), CSV cannot natively represent them. Convert lists to comma-separated strings before exporting:

df['skills'] = df['skills'].apply(
    lambda x: ', '.join(x) if isinstance(x, list) else x
)
print(df)

Output:

    name       skills   company
0  Alice  Python, SQL  TechCorp
1    Bob    Java, AWS  TechCorp

Using the `csv` Module (No Dependencies)

For lightweight scripts or environments where installing Pandas is not an option, Python's built-in csv and json modules handle the job without any external dependencies.

Converting Simple Flat JSON

import json
import csv

# Load JSON data
with open('users.json', 'r') as f:
    data = json.load(f)

# Write to CSV
with open('users.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

This approach works well when every record has the same keys and no nested structures.

Flattening Nested JSON Manually

When dealing with nested objects without Pandas, you need a recursive flattening function:

import json
import csv

def flatten_dict(d, parent_key='', sep='.'):
    """Recursively flatten nested dictionaries."""
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten_dict(v, new_key, sep).items())
        elif isinstance(v, list):
            items.append((new_key, json.dumps(v)))
        else:
            items.append((new_key, v))
    return dict(items)

# Example data with nested objects
data = [
    {"id": 1, "info": {"name": "Alice", "city": "NY"}},
    {"id": 2, "info": {"name": "Bob", "city": "LA"}}
]

# Flatten all records
flat_data = [flatten_dict(record) for record in data]

# Collect all possible keys across all records
all_keys = set()
for record in flat_data:
    all_keys.update(record.keys())

# Write CSV with sorted column names for consistency
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=sorted(all_keys))
    writer.writeheader()
    writer.writerows(flat_data)

print(f"Columns: {sorted(all_keys)}")

Output:

Columns: ['id', 'info.city', 'info.name']

Collecting all keys from every record before writing ensures that records with different fields are handled correctly. DictWriter fills missing fields with empty strings by default.

Handling Large JSON Files

Streaming with `ijson` for Memory Efficiency

When your JSON file is too large to fit in memory, the ijson library parses it incrementally:

pip install ijson

import ijson
import csv

def stream_json_to_csv(json_path, csv_path):
    """Convert a large JSON array to CSV without loading the entire file."""
    with open(json_path, 'rb') as json_file, \
         open(csv_path, 'w', newline='', encoding='utf-8') as csv_file:

        items = ijson.items(json_file, 'item')

        writer = None
        for item in items:
            if writer is None:
                writer = csv.DictWriter(csv_file, fieldnames=item.keys())
                writer.writeheader()
            writer.writerow(item)

stream_json_to_csv('large_data.json', 'output.csv')

note

This approach processes one record at a time, keeping memory usage constant regardless of file size.

Chunked Processing with Pandas

If you need Pandas features but want to limit memory usage, process the data in chunks:

import pandas as pd
import json

def json_to_csv_chunked(json_path, csv_path, chunk_size=10000):
    """Process large JSON data in chunks to limit memory usage."""
    with open(json_path, 'r') as f:
        data = json.load(f)

    for i, start in enumerate(range(0, len(data), chunk_size)):
        chunk = data[start:start + chunk_size]
        df = pd.json_normalize(chunk)

        df.to_csv(
            csv_path,
            mode='a' if i > 0 else 'w',
            header=(i == 0),
            index=False
        )

    print(f"Converted {len(data)} records in {i + 1} chunks")

note

The chunked Pandas approach still loads the entire JSON into memory during json.load(). For truly massive files that cannot fit in memory at all, use the ijson streaming approach instead.

Handling Special Cases

Missing Keys Across Records

JSON records do not always share the same set of keys. Pandas handles this gracefully by filling missing values with NaN:

import pandas as pd
from pandas import json_normalize

data = [
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob"},
    {"id": 3, "name": "Charlie", "phone": "555-1234"}
]

df = json_normalize(data)
print(df.to_string(index=False))

df.to_csv("users.csv", index=False)

Output:

 id    name             email    phone
 Alice alice@example.com      NaN
   Bob               NaN      NaN
Charlie               NaN 555-1234

Selecting and Renaming Columns

You often need to export only specific columns or rename them for downstream consumers:

import pandas as pd

data = [
    {"id": 1, "name": "Alice", "email": "alice@example.com", "internal_code": "A1"},
    {"id": 2, "name": "Bob", "email": "bob@example.com", "internal_code": "B2"}
]

df = pd.DataFrame(data)

# Select only the columns you need
columns_to_export = ['id', 'name', 'email']
df_subset = df[columns_to_export]

# Rename columns for the CSV output
df_subset.columns = ['ID', 'Full Name', 'Email Address']

df_subset.to_csv("custom_output.csv", index=False)
print(df_subset.to_string(index=False))

Output:

 ID Full Name     Email Address
  1     Alice alice@example.com
  2       Bob   bob@example.com

Complete Reusable Conversion Function

Here is a flexible function that handles the most common conversion scenarios:

import pandas as pd
from pandas import json_normalize
import json
from pathlib import Path

def json_to_csv(
    input_path,
    output_path,
    flatten=True,
    columns=None,
    encoding='utf-8'
):
    """
    Convert a JSON file to CSV with configurable options.

    Args:
        input_path: Path to the input JSON file.
        output_path: Path for the output CSV file.
        flatten: Whether to flatten nested objects into dot-separated columns.
        columns: List of column names to include. None includes all columns.
        encoding: Character encoding for the output file.

    Returns:
        The number of records written.
    """
    with open(input_path, 'r', encoding=encoding) as f:
        data = json.load(f)

    # Handle both a single object and an array of objects
    if isinstance(data, dict):
        data = [data]

    # Create the DataFrame with optional flattening
    if flatten:
        df = json_normalize(data)
    else:
        df = pd.DataFrame(data)

    # Filter to requested columns if specified
    if columns:
        available = [c for c in columns if c in df.columns]
        df = df[available]

    # Write to CSV
    df.to_csv(output_path, index=False, encoding=encoding)

    return len(df)

# Usage
rows = json_to_csv("data.json", "output.csv", flatten=True)
print(f"Converted {rows} records")

Method Comparison

Feature	Pandas	`csv` Module
Dependencies	Requires `pandas`	Built-in, no installation needed
Nested data	`json_normalize()` handles automatically	Requires manual flattening function
Performance	C-optimized, fast for large datasets	Slower for large volumes of data
Memory	Loads all data into RAM	Can stream record by record
Missing keys	Fills with `NaN` automatically	Requires manual handling
Best for	Complex structures, analysis workflows	Simple data, minimal environments

Conclusion

Converting JSON to CSV in Python is straightforward for flat data but requires more care when dealing with nested structures, inconsistent schemas, or large files. For most use cases, Pandas with json_normalize() provides the best combination of power and simplicity, automatically flattening nested objects and handling missing keys. When you need zero dependencies or are working in a constrained environment, the built-in csv module paired with a recursive flattening function gets the job done. For files too large to fit in memory, streaming with ijson keeps resource usage under control.

Best Practice

Use Pandas as your default approach for JSON-to-CSV conversion, especially when working with nested data. Reserve the built-in csv module for simple, flat JSON in environments where external dependencies are not an option. Always inspect your JSON structure before writing the conversion code so you can choose the right flattening strategy from the start.

Using Pandas (Recommended for Complex Data)​

Basic Conversion​

Handling Nested JSON with json_normalize()​

Flattening Arrays with record_path​

Using the csv Module (No Dependencies)​

Converting Simple Flat JSON​

Flattening Nested JSON Manually​

Handling Large JSON Files​

Streaming with ijson for Memory Efficiency​

Chunked Processing with Pandas​

Handling Special Cases​

Missing Keys Across Records​

Selecting and Renaming Columns​

Complete Reusable Conversion Function​

Method Comparison​

Conclusion​

Table of Contents

Using Pandas (Recommended for Complex Data)

Basic Conversion

Handling Nested JSON with `json_normalize()`

Flattening Arrays with `record_path`

Using the `csv` Module (No Dependencies)

Converting Simple Flat JSON

Flattening Nested JSON Manually

Handling Large JSON Files

Streaming with `ijson` for Memory Efficiency

Chunked Processing with Pandas

Handling Special Cases

Missing Keys Across Records

Selecting and Renaming Columns

Complete Reusable Conversion Function

Method Comparison

Conclusion