How to Extract and Sort Data from Files in Python
JSON (JavaScript Object Notation) is the standard format for data exchange in web applications and APIs. A common task in Python data processing involves reading a JSON file, extracting specific lists, and sorting them based on a specific field (like a date or a name).
This guide demonstrates how to read a JSON file containing movie data and print a sorted list of titles ordered by their release date.
Understanding the Data Structure
Before coding, we must understand the structure of our input file, movie.json. The JSON file typically contains a root object with keys, one of which holds a list of movie records.
Example movie.json:
{
"movies": [
{"name": "Inception", "published_at": "2010-07-16"},
{"name": "Pulp Fiction", "published_at": "1994-10-14"},
{"name": "The Dark Knight", "published_at": "2008-07-18"}
]
}
Step 1: Loading and Parsing JSON
Python provides the built-in json module to handle this format. We use json.load() to parse a file directly into a Python dictionary.
import json
# ✅ Correct: Open the file and load it
with open("movie.json", "r") as file:
data = json.load(file)
# Access the list using the dictionary key "movies"
movies_list = data["movies"]
json.load() vs json.loads(): Use json.load(file_object) to read from a file. Use json.loads(string) to read from a JSON string variable.
Step 2: Sorting Lists of Dictionaries
We cannot simply call sorted() on a list of dictionaries because Python does not know which value inside the dictionary to compare. We must define a key using a lambda function.
Sorting Logic:
# ⛔️ Incorrect: Raises TypeError ( '<' not supported between instances of 'dict')
# sorted_movies = sorted(movies_list)
# ✅ Correct: Sort by the 'published_at' key
# Lambda x: represents an individual dictionary in the list
sorted_movies = sorted(movies_list, key=lambda x: x["published_at"])
Since the dates are strings in "YYYY-MM-DD" format (ISO 8601), standard string comparison works perfectly for chronological sorting. You do not need to convert them to datetime objects for simple sorting.
Complete Code Solution
Here is the complete movie.py script. It reads the file, sorts the data, and prints the formatted output.
import json
def extract_movie_info(file_path):
"""
Reads a JSON file, extracts movie data, sorts by release date,
and prints the details.
"""
try:
# 1. Read and Parse
with open(file_path, "r") as file:
data = json.load(file)
# 2. Extract the list
movies = data["movies"]
# 3. Sort the list of dictionaries
# key=lambda x: x["published_at"] tells sort to look at the date string
sorted_movies = sorted(movies, key=lambda x: x["published_at"])
# 4. Iterate and Print
for movie in sorted_movies:
name = movie["name"]
published_at = movie["published_at"]
print(f"movie: {name}, published: {published_at}")
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
except KeyError:
print("Error: The JSON structure is missing required keys.")
if __name__ == "__main__":
# Ensure movie.json exists in the same directory
extract_movie_info("movie.json")
Execution Output:
movie: Pulp Fiction, published: 1994-10-14
movie: The Dark Knight, published: 2008-07-18
movie: Inception, published: 2010-07-16
Conclusion
Processing JSON data in Python follows a standard pipeline:
- Context Manager: Use
with open(...)to safely access the file. - Parse: Use
json.load()to convert JSON to Python lists and dictionaries. - Sort: Use
sorted(list, key=lambda ...)to organize complex data structures like dictionaries.