Skip to main content

Python Pandas: How to Resolve "ValueError: Trailing data" Error

When using the pandas.read_json() function, you might encounter the ValueError: Trailing data. This error occurs when the JSON parser successfully reads a complete JSON object or array but then finds additional, unexpected data before the end of the file. Standard JSON files are only allowed to have a single root element (either one object {...} or one array [...]).

The most common cause of this error is trying to read a JSON Lines file (.jsonl, .ndjson), where each line is a separate, valid JSON object. This guide will explain the difference and show you the simple fix by using the lines=True parameter.

Understanding the Error: Standard JSON vs. JSON Lines

The key to solving this error is to understand the format of your file:

  • Standard JSON: A single file must contain exactly one JSON element. This can be a single object or an array that contains multiple objects. The parser expects to reach the end of the file after this single element is closed.
    [
    {"name": "Tom"},
    {"name": "John"}
    ]
  • JSON Lines (.jsonl): A text format where each line is a separate, valid JSON object. This format is common for streaming data and logs.
    {"name": "Tom"}
    {"name": "John"}

The ValueError: Trailing data occurs when you use the default read_json() on a JSON Lines file. The parser reads the first object ({"name": "Tom"}), considers its job done, and then unexpectedly finds more data (the second object) on the next line.

Reproducing the ValueError

Let's assume you have a file named data.json in the JSON Lines format.

data.json file content:

{ "name": "Tom", "about": "29 years old. A programmer." }
{ "name": "John", "about": "32 years old. A designer." }
{ "name": "Susan", "about": "25 years old. A writer." }

Example of code causing the error:

import pandas as pd

# Incorrect: Default read_json() expects a standard JSON file.
data = pd.read_json('data.json')
print(data)

Output:

Traceback (most recent call last):
File "main.py", line 9, in <module>
data = pd.read_json('data.json')
File "/usr/lib/python3.8/site-packages/pandas/util/_decorators.py", line 199, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 618, in read_json
result = json_reader.read()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 755, in read
obj = self._get_object_parser(self.data)
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 777, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 886, in parse
self._parse_no_numpy()
File "/usr/lib/python3.8/site-packages/pandas/io/json/_json.py", line 1119, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data

The pandas.read_json() function has a specific parameter, lines=True, designed to handle the JSON Lines format correctly. This tells pandas to treat each line in the file as an individual JSON object.

Solution:

import pandas as pd

# ✅ Correct: Use the lines=True parameter to parse each line as a JSON object.
data = pd.read_json('data.json', lines=True)

print(data)

Output:

    name                        about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.

This is the idiomatic and most efficient way to solve the problem without modifying the source file.

Solution 2: Manually Fix the JSON File Structure

If you have control over the source file and it is small enough to edit, you can convert it into a standard JSON format by wrapping all the objects in a single list and separating them with commas.

Solution: modified data.json file content:

[
{ "name": "Tom", "about": "29 years old. A programmer." },
{ "name": "John", "about": "32 years old. A designer." },
{ "name": "Susan", "about": "25 years old. A writer." }
]

With this corrected format, the default read_json() command will work without any extra parameters.

import pandas as pd

# This now works because the file is in standard JSON format.
data = pd.read_json('data.json')

print(data)

Output:

    name                        about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.

Bonus: Cleaning Up Embedded Newline Characters

Sometimes, your JSON data itself might contain newline characters (\n). When you read this with lines=True, these characters are preserved in your DataFrame.

data_with_newlines.json file content:

{ "name": "Tom", "about": "29 years old.\n A programmer." }
{ "name": "John", "about": "32 years old.\n A designer." }

You can easily clean these up after loading the data using the .str.replace() method.

Solution:

import pandas as pd

data = pd.read_json('data_with_newlines.json', lines=True)
print("Before cleaning:\n", data)

# Replace the newline character with a space
data['about'] = data['about'].str.replace('\n', ' ')

print("\nAfter cleaning:\n", data)

Output:

Before cleaning:
name about
0 Tom 29 years old.\n A programmer.
1 John 32 years old.\n A designer.

After cleaning:
name about
0 Tom 29 years old. A programmer.
1 John 32 years old. A designer.

Conclusion

If your JSON file has...The best solution is...
Multiple JSON objects, one on each line.Use the lines=True parameter in pd.read_json().
A single JSON object or array.Ensure there is no extra text or data after the closing } or ]. The lines=True parameter is not needed.

The ValueError: Trailing data is a clear indicator that your file is likely in the JSON Lines format. By using the lines=True parameter, you can instruct pandas to parse it correctly, resolving the error in a single, simple step.