Python Pandas: How to Sort a Pandas DataFrame by Date
Datasets frequently contain date columns - order dates, timestamps, expiration dates, or event schedules. Sorting by date is essential for time-series analysis, chronological reporting, and identifying trends. However, date columns are often read as strings by default, which causes incorrect sorting (alphabetical instead of chronological).
This guide walks you through converting date strings to proper datetime objects and sorting your DataFrame by one or multiple date columns.
Why Date Conversion Matters Before Sorting
When Pandas reads a CSV or when you create a DataFrame with date strings, the values are stored as strings, not datetime objects. Sorting strings alphabetically produces different results than sorting dates chronologically:
import pandas as pd
dates = pd.Series(['2024-12-01', '2024-02-15', '2024-11-20'])
# String sorting (alphabetical), works correctly for YYYY-MM-DD format
print("String sort:", dates.sort_values().tolist())
# But this fails with other formats
dates_us = pd.Series(['12/01/2024', '02/15/2024', '11/20/2024'])
print("String sort (MM/DD):", dates_us.sort_values().tolist())
Output:
String sort: ['2024-02-15', '2024-11-20', '2024-12-01']
String sort (MM/DD): ['02/15/2024', '11/20/2024', '12/01/2024']
The YYYY-MM-DD format happens to sort correctly as strings, but MM/DD/YYYY does not - February would come before November alphabetically, which is coincidentally correct here, but 12/01 would sort after 02/15 for the wrong reason. Converting to datetime ensures reliable chronological sorting regardless of the string format.
Sorting by a Single Date Column
Step 1: Create or Load the DataFrame
import pandas as pd
data = pd.DataFrame({
'AdmissionDate': ['2024-01-25', '2024-01-22', '2024-01-20',
'2024-01-18', '2024-01-22', '2024-01-17', '2024-01-21'],
'StudentID': [7, 5, 3, 2, 6, 1, 4],
'Name': ['Ram', 'Shyam', 'Mohan', 'Sohan', 'Lucky', 'Abhinav', 'Danny'],
'Stream': ['CSE', 'ECE', 'Civil', 'Mechanical', 'CSE', 'IT', 'EEE']
})
print(data)
print("\nAdmissionDate dtype:", data['AdmissionDate'].dtype)
Output:
AdmissionDate StudentID Name Stream
0 2024-01-25 7 Ram CSE
1 2024-01-22 5 Shyam ECE
2 2024-01-20 3 Mohan Civil
3 2024-01-18 2 Sohan Mechanical
4 2024-01-22 6 Lucky CSE
5 2024-01-17 1 Abhinav IT
6 2024-01-21 4 Danny EEE
AdmissionDate dtype: object
The AdmissionDate column has dtype object, meaning it contains strings.
Step 2: Convert Strings to Datetime
Use pd.to_datetime() to convert the column:
data['AdmissionDate'] = pd.to_datetime(data['AdmissionDate'])
print("AdmissionDate dtype:", data['AdmissionDate'].dtype)
Output:
AdmissionDate dtype: datetime64[ns]
The column is now a proper datetime type and will sort chronologically.
Step 3: Sort by Date
# Sort in ascending order (earliest first)
sorted_asc = data.sort_values(by='AdmissionDate')
print(sorted_asc)
Output:
AdmissionDate StudentID Name Stream
5 2024-01-17 1 Abhinav IT
3 2024-01-18 2 Sohan Mechanical
2 2024-01-20 3 Mohan Civil
6 2024-01-21 4 Danny EEE
1 2024-01-22 5 Shyam ECE
4 2024-01-22 6 Lucky CSE
0 2024-01-25 7 Ram CSE
Records are now in chronological order from the earliest to the latest admission date.
Sorting in Descending Order (Most Recent First)
sorted_desc = data.sort_values(by='AdmissionDate', ascending=False)
print(sorted_desc)
Output:
AdmissionDate StudentID Name Stream
0 2024-01-25 7 Ram CSE
1 2024-01-22 5 Shyam ECE
4 2024-01-22 6 Lucky CSE
6 2024-01-21 4 Danny EEE
2 2024-01-20 3 Mohan Civil
3 2024-01-18 2 Sohan Mechanical
5 2024-01-17 1 Abhinav IT
Resetting the Index After Sorting
After sorting, the original index values are preserved (e.g., 5, 3, 2, 6, ...). To get a clean sequential index:
sorted_clean = data.sort_values(by='AdmissionDate', ignore_index=True)
print(sorted_clean)
Output:
AdmissionDate StudentID Name Stream
0 2024-01-17 1 Abhinav IT
1 2024-01-18 2 Sohan Mechanical
2 2024-01-20 3 Mohan Civil
3 2024-01-21 4 Danny EEE
4 2024-01-22 5 Shyam ECE
5 2024-01-22 6 Lucky CSE
6 2024-01-25 7 Ram CSE
Sorting by Multiple Date Columns
When a DataFrame has more than one date column, you can specify a priority order. The DataFrame is sorted by the first column, and ties are broken by the second.
import pandas as pd
products = pd.DataFrame({
'Mfg_Date': ['2024-01-25', '2024-01-22', '2024-01-20',
'2024-01-18', '2024-01-22', '2024-01-17', '2024-01-21'],
'ProductID': [7, 5, 3, 2, 6, 1, 4],
'Product': ['Paracetamol', 'Moov', 'Volini', 'Crocin',
'Aciloc', 'Iodex', 'Combiflam'],
'Expiry_Date': ['2025-01-25', '2026-01-22', '2024-05-20',
'2025-03-18', '2025-01-22', '2024-05-17', '2025-01-30']
})
# Convert both columns to datetime
products['Mfg_Date'] = pd.to_datetime(products['Mfg_Date'])
products['Expiry_Date'] = pd.to_datetime(products['Expiry_Date'])
# Sort by Expiry_Date first, then by Mfg_Date for ties
sorted_products = products.sort_values(
by=['Expiry_Date', 'Mfg_Date'],
ignore_index=True
)
print(sorted_products)
Output:
Mfg_Date ProductID Product Expiry_Date
0 2024-01-17 1 Iodex 2024-05-17
1 2024-01-20 3 Volini 2024-05-20
2 2024-01-22 6 Aciloc 2025-01-22
3 2024-01-25 7 Paracetamol 2025-01-25
4 2024-01-21 4 Combiflam 2025-01-30
5 2024-01-18 2 Crocin 2025-03-18
6 2024-01-22 5 Moov 2026-01-22
Products are sorted by expiry date (soonest first), with manufacturing date as the tiebreaker.
Mixed Ascending and Descending
You can sort each date column in a different direction:
# Expiry date ascending, manufacturing date descending
sorted_mixed = products.sort_values(
by=['Expiry_Date', 'Mfg_Date'],
ascending=[True, False],
ignore_index=True
)
print(sorted_mixed)
Output:
Mfg_Date ProductID Product Expiry_Date
0 2024-01-17 1 Iodex 2024-05-17
1 2024-01-20 3 Volini 2024-05-20
2 2024-01-22 6 Aciloc 2025-01-22
3 2024-01-25 7 Paracetamol 2025-01-25
4 2024-01-21 4 Combiflam 2025-01-30
5 2024-01-18 2 Crocin 2025-03-18
6 2024-01-22 5 Moov 2026-01-22
Converting Multiple Date Columns at Once
When you have several date columns, convert them all in a single operation using apply():
date_columns = ['Mfg_Date', 'Expiry_Date']
products[date_columns] = products[date_columns].apply(pd.to_datetime)
Output:
Mfg_Date ProductID Product Expiry_Date
0 2024-01-25 7 Paracetamol 2025-01-25
1 2024-01-22 5 Moov 2026-01-22
2 2024-01-20 3 Volini 2024-05-20
3 2024-01-18 2 Crocin 2025-03-18
4 2024-01-22 6 Aciloc 2025-01-22
5 2024-01-17 1 Iodex 2024-05-17
6 2024-01-21 4 Combiflam 2025-01-30
Parsing Dates During CSV Loading
Instead of converting after loading, you can parse dates directly in read_csv():
import pandas as pd
df = pd.read_csv('data.csv', parse_dates=['AdmissionDate'])
print(df['AdmissionDate'].dtype)
Output:
datetime64[ns]
The column is already a datetime type - no separate conversion step needed.
Using parse_dates in read_csv() is the cleanest approach. It handles conversion automatically and works with most standard date formats. For non-standard formats, combine it with date_format:
df = pd.read_csv('data.csv', parse_dates=['date_col'], date_format='%d/%m/%Y')
Common Mistake: Sorting Without Converting to Datetime First
Sorting a date column that is still stored as strings can produce incorrect results with certain formats:
import pandas as pd
df = pd.DataFrame({
'Date': ['03/15/2024', '12/01/2023', '01/20/2024'],
'Event': ['Conference', 'Workshop', 'Meetup']
})
# WRONG: sorting strings, alphabetical order, not chronological
sorted_wrong = df.sort_values(by='Date')
print("String sort (incorrect):")
print(sorted_wrong)
Output:
String sort (incorrect):
Date Event
2 01/20/2024 Meetup
0 03/15/2024 Conference
1 12/01/2023 Workshop
December 2023 should come first chronologically, but it appears last because "12" comes after "01" and "03" alphabetically.
The correct approach:
import pandas as pd
df = pd.DataFrame({
'Date': ['03/15/2024', '12/01/2023', '01/20/2024'],
'Event': ['Conference', 'Workshop', 'Meetup']
})
# CORRECT: convert to datetime first, then sort
df['Date'] = pd.to_datetime(df['Date'])
sorted_correct = df.sort_values(by='Date')
print("Datetime sort (correct):")
print(sorted_correct)
Output:
Datetime sort (correct):
Date Event
1 2023-12-01 Workshop
2 2024-01-20 Meetup
0 2024-03-15 Conference
Always convert date columns to datetime64 before sorting. String-based sorting only works correctly for the YYYY-MM-DD format and fails silently with other formats, producing results that look plausible but are chronologically wrong.
Quick Reference
| Step | Code |
|---|---|
| Convert string to datetime | df['date'] = pd.to_datetime(df['date']) |
| Sort ascending (oldest first) | df.sort_values(by='date') |
| Sort descending (newest first) | df.sort_values(by='date', ascending=False) |
| Sort by multiple date columns | df.sort_values(by=['date1', 'date2']) |
| Reset index after sorting | df.sort_values(by='date', ignore_index=True) |
| Parse dates during CSV load | pd.read_csv('file.csv', parse_dates=['date']) |
| Convert multiple columns at once | df[cols].apply(pd.to_datetime) |
Converting date columns to proper datetime objects and then using sort_values() is the reliable pattern for chronological sorting in Pandas. This two-step process ensures correct ordering regardless of the original date string format.