Python Pandas: How to Process Date and Time with Pandas DataFrame in Python
Working with dates and times is one of the most common tasks in data analysis. Raw datetime values in a dataset often need to be transformed - extracting components like year, month, or hour, converting string formats, or filtering rows by time ranges - before they can be used effectively in analysis or modeling.
Pandas was originally built with financial modeling in mind, so it offers a rich set of tools for datetime processing. In this guide, you will learn how to extract datetime features, convert strings to timestamps, reformat date patterns, identify days of the week, and filter data by date and time ranges using Pandas.
Extracting Date and Time Components
One of the most frequent preprocessing steps is splitting a single datetime column into individual components such as year, month, day, hour, and minute. Pandas provides the .dt accessor on datetime Series for this purpose.
import pandas as pd
df = pd.DataFrame()
# Generate 6 timestamps spaced 2 hours apart
df["time"] = pd.date_range("2/5/2019", periods=6, freq="2h")
# Extract individual datetime components
df["year"] = df["time"].dt.year
df["month"] = df["time"].dt.month
df["day"] = df["time"].dt.day
df["hour"] = df["time"].dt.hour
df["minute"] = df["time"].dt.minute
print(df)
Output:
time year month day hour minute
0 2019-02-05 00:00:00 2019 2 5 0 0
1 2019-02-05 02:00:00 2019 2 5 2 0
2 2019-02-05 04:00:00 2019 2 5 4 0
3 2019-02-05 06:00:00 2019 2 5 6 0
4 2019-02-05 08:00:00 2019 2 5 8 0
5 2019-02-05 10:00:00 2019 2 5 10 0
The .dt accessor provides many more properties beyond the ones shown above. Some useful ones include:
df["time"].dt.second- secondsdf["time"].dt.day_of_week- day of the week as an integer (Monday=0, Sunday=6)df["time"].dt.day_name()- full day name (e.g., "Monday")df["time"].dt.quarter- quarter of the year (1–4)df["time"].dt.is_month_end- whether the date is the last day of the month
Converting Strings to Timestamps
Real-world datasets often store dates as strings in various formats. Use pd.to_datetime() to parse these strings into proper Timestamp objects that Pandas can work with.
import numpy as np
import pandas as pd
# Date strings in a non-standard format
date_strings = np.array([
"04-03-2019 12:35 PM",
"22-06-2017 11:01 AM",
"05-09-2009 07:09 PM",
])
# Convert to datetime using the correct format string
timestamps = pd.to_datetime(date_strings, format="%d-%m-%Y %I:%M %p")
print(timestamps)
Output:
DatetimeIndex(['2019-03-04 12:35:00', '2017-06-22 11:01:00',
'2009-09-05 19:09:00'],
dtype='datetime64[ns]', freq=None)
Common Mistake: Wrong Format String
If your format string does not match the actual date pattern, pd.to_datetime() will either raise an error or produce incorrect results.
# Wrong format: month and day swapped
wrong = pd.to_datetime("04-03-2019 12:35 PM", format="%m-%d-%Y %I:%M %p")
print(wrong) # 2019-04-03 12:35:00: April 3rd instead of March 4th!
Correct approach:
# Correct format: day first, then month
correct = pd.to_datetime("04-03-2019 12:35 PM", format="%d-%m-%Y %I:%M %p")
print(correct) # 2019-03-04 12:35:00: March 4th ✓
Use the errors parameter to handle unparseable dates gracefully:
errors="raise"(default) - raises an exception on invalid dates.errors="coerce"- converts invalid dates toNaT(Not a Time) instead of failing.errors="ignore"- returns the original input unchanged if parsing fails.
pd.to_datetime("not-a-date", errors="coerce")
# Returns: NaT
Common Format Codes Reference
| Code | Meaning | Example |
|---|---|---|
%Y | 4-digit year | 2019 |
%m | Month as zero-padded number | 03 |
%d | Day as zero-padded number | 04 |
%H | Hour (24-hour clock) | 14 |
%I | Hour (12-hour clock) | 02 |
%M | Minute | 35 |
%S | Second | 09 |
%p | AM/PM | PM |
%B | Full month name | February |
%A | Full day name | Monday |
Changing the Display Format of Timestamps
You can reformat datetime values into custom string patterns using the strftime() method. This is useful for creating human-readable date labels or preparing data for export.
import pandas as pd
df = pd.DataFrame()
df["time"] = pd.date_range("2/5/2019", periods=6, freq="2h")
# Reformat to "day-MonthName" pattern
df["formatted"] = df["time"].dt.strftime("%d-%B")
# Another format: "Year/Month/Day Hour:Minute"
df["custom"] = df["time"].dt.strftime("%Y/%m/%d %H:%M")
print(df)
Output:
time formatted custom
0 2019-02-05 00:00:00 05-February 2019/02/05 00:00
1 2019-02-05 02:00:00 05-February 2019/02/05 02:00
2 2019-02-05 04:00:00 05-February 2019/02/05 04:00
3 2019-02-05 06:00:00 05-February 2019/02/05 06:00
4 2019-02-05 08:00:00 05-February 2019/02/05 08:00
5 2019-02-05 10:00:00 05-February 2019/02/05 10:00
The strftime() method converts datetime values to strings (dtype: object). Once converted, you can no longer use .dt accessor methods on that column. Only apply strftime() as a final formatting step, not during intermediate processing.
Extracting Day Names from Dates
To determine which day of the week a date falls on, use the day_name() method:
import pandas as pd
dates = pd.Series(pd.date_range("2/5/2019", periods=6, freq="ME"))
print("Dates:")
print(dates)
print("\nDay names:")
print(dates.dt.day_name())
Output:
Dates:
0 2019-02-28
1 2019-03-31
2 2019-04-30
3 2019-05-31
4 2019-06-30
5 2019-07-31
dtype: datetime64[ns]
Day names:
0 Thursday
1 Sunday
2 Tuesday
3 Friday
4 Sunday
5 Wednesday
dtype: object
In older versions of Pandas (before 0.25), the attribute dt.weekday_name was used instead of the method dt.day_name(). If you encounter AttributeError: 'DatetimeProperties' object has no attribute 'weekday_name', switch to dt.day_name().
You can also get the day of the week as an integer (Monday=0, Sunday=6):
print(dates.dt.dayofweek)
Output:
0 3
1 6
2 1
3 4
4 6
5 2
dtype: int32
Filtering Data by Date and Time Ranges
A common analysis task is selecting rows that fall within a specific date or time window. There are two main approaches depending on whether your DataFrame is indexed by datetime.
Method 1: Filtering with Boolean Conditions (No DateTime Index)
When the datetime column is a regular column (not the index), use standard comparison operators:
import pandas as pd
df = pd.DataFrame()
df["date"] = pd.date_range("1/1/2012", periods=1000, freq="h")
print("Total rows:", len(df))
print(df.head())
# Select rows between two specific datetimes
filtered = df[(df["date"] > "2012-01-01 01:00:00") &
(df["date"] <= "2012-01-01 11:00:00")]
print("\nFiltered rows:")
print(filtered)
Output:
Total rows: 1000
date
0 2012-01-01 00:00:00
1 2012-01-01 01:00:00
2 2012-01-01 02:00:00
3 2012-01-01 03:00:00
4 2012-01-01 04:00:00
Filtered rows:
date
2 2012-01-01 02:00:00
3 2012-01-01 03:00:00
4 2012-01-01 04:00:00
5 2012-01-01 05:00:00
6 2012-01-01 06:00:00
7 2012-01-01 07:00:00
8 2012-01-01 08:00:00
9 2012-01-01 09:00:00
10 2012-01-01 10:00:00
11 2012-01-01 11:00:00
Method 2: Filtering with .loc (DateTime Index)
When the DataFrame is indexed by datetime, you can use .loc with string-based slicing for cleaner, more readable code:
import pandas as pd
df = pd.DataFrame()
df["date"] = pd.date_range("1/1/2012", periods=1000, freq="h")
# Set datetime column as index
df = df.set_index("date")
# Slice using .loc: both endpoints are inclusive
filtered = df.loc["2012-01-01 04:00:00":"2012-01-01 12:00:00"]
print(filtered)
Output:
Empty DataFrame
Columns: []
Index: [2012-01-01 04:00:00, 2012-01-01 05:00:00, 2012-01-01 06:00:00, 2012-01-01 07:00:00, 2012-01-01 08:00:00, 2012-01-01 09:00:00, 2012-01-01 10:00:00, 2012-01-01 11:00:00, 2012-01-01 12:00:00]
When using .loc with a datetime index, you can also use partial string indexing for broader selections:
# Select all rows from January 2012
jan_data = df.loc["2012-01"]
# Select all rows from 2012
year_data = df.loc["2012"]
This is one of the most convenient features of having a DatetimeIndex.
Method 3: Using between() for Cleaner Syntax
For a more readable alternative to chained comparisons:
import pandas as pd
df = pd.DataFrame()
df["date"] = pd.date_range("1/1/2012", periods=1000, freq="h")
filtered = df[df["date"].between("2012-01-01 04:00:00", "2012-01-01 08:00:00")]
print(filtered)
Output:
date
4 2012-01-01 04:00:00
5 2012-01-01 05:00:00
6 2012-01-01 06:00:00
7 2012-01-01 07:00:00
8 2012-01-01 08:00:00
Calculating Time Differences
Another common operation is computing the difference between two datetime values, which produces a Timedelta object:
import pandas as pd
df = pd.DataFrame({
"start": pd.to_datetime(["2023-01-01 08:00", "2023-01-02 09:30"]),
"end": pd.to_datetime(["2023-01-01 17:30", "2023-01-02 18:00"]),
})
df["duration"] = df["end"] - df["start"]
df["hours_worked"] = df["duration"].dt.total_seconds() / 3600
print(df)
Output:
start end duration hours_worked
0 2023-01-01 08:00:00 2023-01-01 17:30:00 0 days 09:30:00 9.5
1 2023-01-02 09:30:00 2023-01-02 18:00:00 0 days 08:30:00 8.5
Summary
Pandas provides a comprehensive toolkit for processing dates and times in DataFrames. Here is a quick reference of the key techniques covered:
| Task | Method |
|---|---|
| Generate datetime sequences | pd.date_range() |
| Extract components (year, month, etc.) | .dt.year, .dt.month, .dt.hour, etc. |
| Convert strings to datetime | pd.to_datetime() with format parameter |
| Reformat datetime display | .dt.strftime() |
| Get day of week name | .dt.day_name() |
| Filter by date range (no index) | Boolean conditions or .between() |
| Filter by date range (datetime index) | .loc[] with string slicing |
| Calculate time differences | Subtraction → Timedelta, then .dt.total_seconds() |
By mastering these operations, you can efficiently preprocess and analyze any time-series or datetime-based dataset in Python.