How to Merge Multiple JSON Files Using Python
When working with data pipelines, APIs, or configuration systems, you'll often encounter situations where related data is spread across multiple JSON files. For example, you might have daily log exports, per-user configuration files, or paginated API responses that need to be combined into a single unified file for analysis or processing.
In this guide, you'll learn multiple approaches to merge JSON files in Python: from simple built-in modules to automated directory scanning: along with best practices for handling different JSON structures.
Sample JSON Files
Throughout this guide, we'll use the following sample JSON files:
{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "Chicago"}
{"name": "Charlie", "age": 35, "city": "Boston"}
Method 1: Using the json Module with Explicit File Paths
The most straightforward approach uses Python's built-in json module to read each file and append its data to a list:
import json
def merge_json_files(file_paths, output_file):
merged_data = []
for path in file_paths:
with open(path, 'r') as file:
data = json.load(file)
merged_data.append(data)
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
print(f"Merged {len(file_paths)} files into '{output_file}'")
return merged_data
# Specify files explicitly
file_paths = ["users.json", "orders.json", "products.json"]
result = merge_json_files(file_paths, "merged.json")
print(result)
Output (merged.json):
[
{'name': 'Alice', 'age': 30, 'city': 'New York'},
{'name': 'Bob', 'age': 25, 'city': 'Chicago'},
{'name': 'Charlie', 'age': 35, 'city': 'Boston'}
]
Always use indent=2 (or indent=4) in json.dump() when the output file needs to be human-readable. Without it, the entire JSON is written on a single line, making it difficult to inspect.
Method 2: Using List Comprehension for Concise Code
If you prefer a more compact style, list comprehension allows you to read and merge files in fewer lines:
import json
def merge_json_files(file_paths):
merged_data = []
for path in file_paths:
with open(path, 'r') as f:
merged_data.append(json.load(f))
return merged_data
file_paths = ["users.json", "orders.json", "products.json"]
merged_data = merge_json_files(file_paths)
with open("merged.json", 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
print(merged_data)
Output:
[
{'name': 'Alice', 'age': 30, 'city': 'New York'},
{'name': 'Bob', 'age': 25, 'city': 'Chicago'},
{'name': 'Charlie', 'age': 35, 'city': 'Boston'}
]
You might see examples that use json.load(open(path, 'r')) inside a list comprehension without a with statement. This is bad practice because it doesn't guarantee the file handle is properly closed:
# ❌ Bad: file handles may not be closed properly
merged_data = [json.load(open(path, 'r')) for path in file_paths]
Always use with open(...) to ensure files are closed correctly, even if an error occurs during reading.
Method 3: Scanning a Directory with os
When you have all your JSON files in a single directory and don't want to list them manually, use os.listdir() to discover them automatically:
import json
import os
def merge_json_from_directory(directory_path, output_file):
merged_data = []
for filename in sorted(os.listdir(directory_path)):
if filename.endswith('.json'):
filepath = os.path.join(directory_path, filename)
with open(filepath, 'r') as file:
data = json.load(file)
merged_data.append(data)
print(f" Read: {filename}")
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
print(f"\nMerged {len(merged_data)} files into '{output_file}'")
return merged_data
result = merge_json_from_directory("./data", "merged.json")
Output:
Read: orders.json
Read: products.json
Read: users.json
Merged 3 files into 'merged.json'
Using sorted() on os.listdir() ensures files are processed in alphabetical order. Without sorting, the order depends on the filesystem and may vary across operating systems.
Method 4: Using glob for Pattern Matching
The glob module offers more flexible file discovery with wildcard patterns. This is especially useful when your JSON files follow a naming convention:
import json
import glob
def merge_json_with_glob(pattern, output_file):
merged_data = []
file_paths = sorted(glob.glob(pattern))
if not file_paths:
print(f"No files found matching pattern: {pattern}")
return []
for path in file_paths:
with open(path, 'r') as file:
data = json.load(file)
merged_data.append(data)
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
print(f"Merged {len(file_paths)} files into '{output_file}'")
return merged_data
# Merge all JSON files in the "data" directory
result = merge_json_with_glob("data/*.json", "merged.json")
# Or match a specific pattern
# result = merge_json_with_glob("data/report_2025_*.json", "merged_reports.json")
Method 5: Merging into a Dictionary Instead of a List
The previous methods merge JSON objects into a list (array). Sometimes you need to merge them into a single dictionary where all key-value pairs are combined:
import json
import glob
def merge_json_as_dict(pattern, output_file):
merged_data = {}
for path in sorted(glob.glob(pattern)):
with open(path, 'r') as file:
data = json.load(file)
merged_data.update(data)
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
return merged_data
result = merge_json_as_dict("data/*.json", "merged_dict.json")
print(result)
Output (merged_dict.json):
{
"name": "Charlie",
"age": 35,
"city": "Boston"
}
When using dict.update(), duplicate keys are overwritten by the last file processed. In the example above, since all three files have "name", "age", and "city" keys, only the values from the last file (products.json) are preserved. Use the list-based approach if you need to keep all records.
Method 6: Using Pandas for Structured Data
When your JSON files contain records with the same schema (same keys), Pandas can merge them into a structured DataFrame, which is ideal for further analysis:
import pandas as pd
import glob
def merge_json_with_pandas(pattern, output_file):
file_paths = sorted(glob.glob(pattern))
dataframes = []
for path in file_paths:
df = pd.read_json(path, typ='series')
dataframes.append(df)
merged_df = pd.DataFrame(dataframes).reset_index(drop=True)
# Save as JSON array of records
merged_df.to_json(output_file, orient='records', indent=2)
print(f"Merged {len(file_paths)} files into '{output_file}'")
return merged_df
result = merge_json_with_pandas("data/*.json", "merged_pandas.json")
print(result)
Output:
Merged 3 files into 'merged_pandas.json'
name age city
0 Alice 30 New York
1 Bob 25 Chicago
2 Charlie 35 Boston
merged_pandas.json:
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "Chicago"},
{"name": "Charlie", "age": 35, "city": "Boston"}
]
Use orient='records' with to_json() to produce a clean JSON array where each object represents one row. Other orientations like 'columns' or 'index' produce structures that are harder to work with downstream.
Handling Edge Cases
Files Containing JSON Arrays
If your JSON files already contain arrays (lists of objects) rather than single objects, you need to extend rather than append:
import json
def merge_json_arrays(file_paths, output_file):
merged_data = []
for path in file_paths:
with open(path, 'r') as file:
data = json.load(file)
if isinstance(data, list):
merged_data.extend(data) # Flatten arrays
else:
merged_data.append(data) # Single objects
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
return merged_data
Handling Invalid JSON Files
In real-world scenarios, some files might contain malformed JSON. Add error handling to skip problematic files gracefully:
import json
import glob
def safe_merge_json(pattern, output_file):
merged_data = []
errors = []
for path in sorted(glob.glob(pattern)):
try:
with open(path, 'r') as file:
data = json.load(file)
merged_data.append(data)
except json.JSONDecodeError as e:
errors.append(path)
print(f"Warning: Skipping '{path}': invalid JSON ({e})")
with open(output_file, 'w') as outfile:
json.dump(merged_data, outfile, indent=2)
print(f"\nMerged: {len(merged_data)} files | Skipped: {len(errors)} files")
return merged_data
result = safe_merge_json("data/*.json", "merged_safe.json")
Choosing the Right Approach
| Method | Best For |
|---|---|
json module with explicit paths | Small number of known files |
os.listdir() | All JSON files in a single directory |
glob.glob() | Pattern-based file discovery |
Dictionary merge (update()) | Combining key-value pairs (no duplicates) |
| Pandas | Structured/tabular JSON data for analysis |
Summary
Merging multiple JSON files in Python is straightforward with the built-in json module. Use explicit file lists for small, known sets of files, or glob/os for automatic discovery in directories.
Choose between list-based merging (preserves all records) and dictionary-based merging (combines key-value pairs) depending on your data structure.
Always handle potential issues like malformed JSON and duplicate keys, and use indent in json.dump() to keep the output readable.