Python Pandas: How to Resolve "ValueError: Trailing data" Error
When using the pandas.read_json() function, you might encounter the ValueError: Trailing data. This error occurs when the JSON parser successfully reads a complete JSON object or array but then finds additional, unexpected data before the end of the file. Standard JSON files are only allowed to have a single root element (either one object {...} or one array [...]).
The most common cause of this error is trying to read a JSON Lines file (.jsonl, .ndjson), where each line is a separate, valid JSON object. This guide will explain the difference and show you the simple fix by using the lines=True parameter.
Understanding the Error: Standard JSON vs. JSON Lines
The key to solving this error is to understand the format of your file:
- Standard JSON: A single file must contain exactly one JSON element. This can be a single object or an array that contains multiple objects. The parser expects to reach the end of the file after this single element is closed.
[
{"name": "Tom"},
{"name": "John"}
] - JSON Lines (
.jsonl): A text format where each line is a separate, valid JSON object. This format is common for streaming data and logs.{"name": "Tom"}
{"name": "John"}
The ValueError: Trailing data occurs when you use the default read_json() on a JSON Lines file. The parser reads the first object ({"name": "Tom"}), considers its job done, and then unexpectedly finds more data (the second object) on the next line.
Reproducing the ValueError
Let's assume you have a file named data.json in the JSON Lines format.
data.json file content:
{ "name": "Tom", "about": "29 years old. A programmer." }
{ "name": "John", "about": "32 years old. A designer." }
{ "name": "Susan", "about": "25 years old. A writer." }
Example of code causing the error:
import pandas as pd
# Incorrect: Default read_json() expects a standard JSON file.
data = pd.read_json('data.json')
print(data)
Output:
Traceback (most recent call last):
File "main.py", line 9, in <module>
data = pd.read_json('data.json')
File "/usr/lib/python3.8/site-packages/pandas/util/_decorators.py", line 199, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 618, in read_json
result = json_reader.read()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 755, in read
obj = self._get_object_parser(self.data)
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 777, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 886, in parse
self._parse_no_numpy()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 1119, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data
Solution 1: Use lines=True for JSON Lines Format (Recommended)
The pandas.read_json() function has a specific parameter, lines=True, designed to handle the JSON Lines format correctly. This tells pandas to treat each line in the file as an individual JSON object.
Solution:
import pandas as pd
# ✅ Correct: Use the lines=True parameter to parse each line as a JSON object.
data = pd.read_json('data.json', lines=True)
print(data)
Output:
name about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.
This is the idiomatic and most efficient way to solve the problem without modifying the source file.
Solution 2: Manually Fix the JSON File Structure
If you have control over the source file and it is small enough to edit, you can convert it into a standard JSON format by wrapping all the objects in a single list and separating them with commas.
Solution: modified data.json file content:
[
{ "name": "Tom", "about": "29 years old. A programmer." },
{ "name": "John", "about": "32 years old. A designer." },
{ "name": "Susan", "about": "25 years old. A writer." }
]
With this corrected format, the default read_json() command will work without any extra parameters.
import pandas as pd
# This now works because the file is in standard JSON format.
data = pd.read_json('data.json')
print(data)
Output:
name about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.
Bonus: Cleaning Up Embedded Newline Characters
Sometimes, your JSON data itself might contain newline characters (\n). When you read this with lines=True, these characters are preserved in your DataFrame.
data_with_newlines.json file content:
{ "name": "Tom", "about": "29 years old.\n A programmer." }
{ "name": "John", "about": "32 years old.\n A designer." }
You can easily clean these up after loading the data using the .str.replace() method.
Solution:
import pandas as pd
data = pd.read_json('data_with_newlines.json', lines=True)
print("Before cleaning:\n", data)
# Replace the newline character with a space
data['about'] = data['about'].str.replace('\n', ' ')
print("\nAfter cleaning:\n", data)
Output:
Before cleaning:
name about
0 Tom 29 years old.\n A programmer.
1 John 32 years old.\n A designer.
After cleaning:
name about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.
Conclusion
| If your JSON file has... | The best solution is... |
|---|---|
| Multiple JSON objects, one on each line. | Use the lines=True parameter in pd.read_json(). |
| A single JSON object or array. | Ensure there is no extra text or data after the closing } or ]. The lines=True parameter is not needed. |
The ValueError: Trailing data is a clear indicator that your file is likely in the JSON Lines format. By using the lines=True parameter, you can instruct pandas to parse it correctly, resolving the error in a single, simple step.