Skip to main content

How to Extract and Sort Data from Files in Python

JSON (JavaScript Object Notation) is the standard format for data exchange in web applications and APIs. A common task in Python data processing involves reading a JSON file, extracting specific lists, and sorting them based on a specific field (like a date or a name).

This guide demonstrates how to read a JSON file containing movie data and print a sorted list of titles ordered by their release date.

Understanding the Data Structure

Before coding, we must understand the structure of our input file, movie.json. The JSON file typically contains a root object with keys, one of which holds a list of movie records.

Example movie.json:

{
"movies": [
{"name": "Inception", "published_at": "2010-07-16"},
{"name": "Pulp Fiction", "published_at": "1994-10-14"},
{"name": "The Dark Knight", "published_at": "2008-07-18"}
]
}

Step 1: Loading and Parsing JSON

Python provides the built-in json module to handle this format. We use json.load() to parse a file directly into a Python dictionary.

import json

# ✅ Correct: Open the file and load it
with open("movie.json", "r") as file:
data = json.load(file)

# Access the list using the dictionary key "movies"
movies_list = data["movies"]
note

json.load() vs json.loads(): Use json.load(file_object) to read from a file. Use json.loads(string) to read from a JSON string variable.

Step 2: Sorting Lists of Dictionaries

We cannot simply call sorted() on a list of dictionaries because Python does not know which value inside the dictionary to compare. We must define a key using a lambda function.

Sorting Logic:

# ⛔️ Incorrect: Raises TypeError ( '<' not supported between instances of 'dict')
# sorted_movies = sorted(movies_list)

# ✅ Correct: Sort by the 'published_at' key
# Lambda x: represents an individual dictionary in the list
sorted_movies = sorted(movies_list, key=lambda x: x["published_at"])
tip

Since the dates are strings in "YYYY-MM-DD" format (ISO 8601), standard string comparison works perfectly for chronological sorting. You do not need to convert them to datetime objects for simple sorting.

Complete Code Solution

Here is the complete movie.py script. It reads the file, sorts the data, and prints the formatted output.

import json

def extract_movie_info(file_path):
"""
Reads a JSON file, extracts movie data, sorts by release date,
and prints the details.
"""
try:
# 1. Read and Parse
with open(file_path, "r") as file:
data = json.load(file)

# 2. Extract the list
movies = data["movies"]

# 3. Sort the list of dictionaries
# key=lambda x: x["published_at"] tells sort to look at the date string
sorted_movies = sorted(movies, key=lambda x: x["published_at"])

# 4. Iterate and Print
for movie in sorted_movies:
name = movie["name"]
published_at = movie["published_at"]
print(f"movie: {name}, published: {published_at}")

except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
except KeyError:
print("Error: The JSON structure is missing required keys.")

if __name__ == "__main__":
# Ensure movie.json exists in the same directory
extract_movie_info("movie.json")

Execution Output:

movie: Pulp Fiction, published: 1994-10-14
movie: The Dark Knight, published: 2008-07-18
movie: Inception, published: 2010-07-16

Conclusion

Processing JSON data in Python follows a standard pipeline:

  1. Context Manager: Use with open(...) to safely access the file.
  2. Parse: Use json.load() to convert JSON to Python lists and dictionaries.
  3. Sort: Use sorted(list, key=lambda ...) to organize complex data structures like dictionaries.