Python Pandas: How to Calculate Cumulative Sum (Running Total) in Pandas
Running totals, also known as cumulative sums, are essential for tracking account balances, monitoring growth metrics, analyzing inventory levels, and working with time-series data. Instead of looking at individual values in isolation, a cumulative sum shows the progressive accumulation of values up to each point in the dataset.
In this guide, you will learn how to compute cumulative sums along columns and rows, calculate running totals within groups, handle missing data, and apply these techniques to practical scenarios.
Basic Column-Wise Running Total
The .cumsum() method computes the cumulative sum down through the rows of a column by default. Each value in the result represents the sum of all values from the top of the column up to and including the current row:
import pandas as pd
df = pd.DataFrame({
'Date': ['Mon', 'Tue', 'Wed', 'Thu'],
'Sales': [100, 150, 200, 50]
})
df['Running_Total'] = df['Sales'].cumsum()
print(df)
Output:
Date Sales Running_Total
0 Mon 100 100
1 Tue 150 250
2 Wed 200 450
3 Thu 50 500
On Monday the running total equals the day's sales (100). By Tuesday it becomes 100 + 150 = 250, and so on. The final value (500) equals the total sum of the entire column.
Row-Wise (Horizontal) Cumulative Sum
Setting axis=1 computes the cumulative sum across columns for each row. This is useful when columns represent sequential time periods and you want to see progressive accumulation:
import pandas as pd
df = pd.DataFrame({
'Q1': [100, 200],
'Q2': [150, 180],
'Q3': [200, 220],
'Q4': [175, 195]
})
cumulative = df.cumsum(axis=1)
print(cumulative)
Output:
Q1 Q2 Q3 Q4
0 100 250 450 625
1 200 380 600 795
Each row now shows the year-to-date total at the end of each quarter. For the first row, Q2 shows 100 + 150 = 250, Q3 shows 250 + 200 = 450, and so on.
Grouped Cumulative Sum
When your data contains categories, you often need running totals calculated independently within each group. Combine groupby() with cumsum() to achieve this:
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'A', 'A', 'B', 'B', 'B'],
'Sales': [100, 150, 200, 80, 120, 100]
})
df['Group_Total'] = df.groupby('Category')['Sales'].cumsum()
print(df)
Output:
Category Sales Group_Total
0 A 100 100
1 A 150 250
2 A 200 450
3 B 80 80
4 B 120 200
5 B 100 300
Each category's running total starts fresh. Category A accumulates to 450 and Category B accumulates to 300, with no cross-contamination between groups.
Handling Missing Data
The skipna parameter controls how NaN values are treated during cumulative summation. The default behavior skips them, but you can choose to let them propagate:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Sales': [100, np.nan, 200, 50]
})
# Default: skipna=True ignores NaN values
df['Skip_NaN'] = df['Sales'].cumsum(skipna=True)
# skipna=False causes NaN to propagate through all subsequent values
df['Keep_NaN'] = df['Sales'].cumsum(skipna=False)
print(df)
Output:
Sales Skip_NaN Keep_NaN
0 100.0 100.0 100.0
1 NaN NaN NaN
2 200.0 300.0 NaN
3 50.0 350.0 NaN
With skipna=True (the default), the NaN in row 1 is simply ignored and the running total continues from the previous value. With skipna=False, once a NaN is encountered, every subsequent value also becomes NaN because adding anything to NaN produces NaN. Choose skipna=False when missing data should invalidate all downstream totals.
Practical Examples
Account Balance Tracking
A classic use of cumulative sums is tracking an account balance where deposits are positive and withdrawals are negative:
import pandas as pd
df = pd.DataFrame({
'Transaction': ['Deposit', 'Withdrawal', 'Deposit', 'Withdrawal'],
'Amount': [1000, -200, 500, -150]
})
df['Balance'] = df['Amount'].cumsum()
print(df)
Output:
Transaction Amount Balance
0 Deposit 1000 1000
1 Withdrawal -200 800
2 Deposit 500 1300
3 Withdrawal -150 1150
The running total naturally handles both additions and subtractions, giving you the balance after each transaction.
Inventory Stock Level Tracking
Similarly, inventory levels can be tracked with received units as positive and sold units as negative:
import pandas as pd
df = pd.DataFrame({
'Action': ['Receive', 'Sell', 'Sell', 'Receive'],
'Units': [100, -30, -25, 50]
})
df['Stock'] = df['Units'].cumsum()
print(df)
Output:
Action Units Stock
0 Receive 100 100
1 Sell -30 70
2 Sell -25 45
3 Receive 50 95
Monthly Revenue with Year-to-Date Totals
import pandas as pd
df = pd.DataFrame({
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'Revenue': [12000, 15000, 11000, 18000, 16000, 20000]
})
df['YTD_Revenue'] = df['Revenue'].cumsum()
df['YTD_Avg'] = df['YTD_Revenue'] / (df.index + 1)
print(df)
Output:
Month Revenue YTD_Revenue YTD_Avg
0 Jan 12000 12000 12000.000000
1 Feb 15000 27000 13500.000000
2 Mar 11000 38000 12666.666667
3 Apr 18000 56000 14000.000000
4 May 16000 72000 14400.000000
5 Jun 20000 92000 15333.333333
The year-to-date average uses the cumulative sum divided by the number of months elapsed, showing how the average revenue trends over time.
Common Mistake: Forgetting to Group
When working with multi-category data, applying .cumsum() without groupby() produces a single running total across all categories, which is usually not the intended behavior:
import pandas as pd
df = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Sales': [100, 200, 150, 250]
})
# Wrong: cumulative sum crosses store boundaries
df['Wrong_Total'] = df['Sales'].cumsum()
# Correct: cumulative sum within each store
df['Correct_Total'] = df.groupby('Store')['Sales'].cumsum()
print(df)
Output:
Store Sales Wrong_Total Correct_Total
0 NYC 100 100 100
1 NYC 200 300 300
2 LA 150 450 150
3 LA 250 700 400
Always use groupby() before cumsum() when your data contains distinct categories that should have independent running totals. Without grouping, values from one category bleed into another.
Quick Reference
| Parameter | Default | Effect |
|---|---|---|
axis | 0 | 0 = down rows, 1 = across columns |
skipna | True | True = ignore NaN, False = propagate NaN |
| Use Case | Code |
|---|---|
| Running total | df['col'].cumsum() |
| Horizontal cumsum | df.cumsum(axis=1) |
| Grouped cumsum | df.groupby('cat')['col'].cumsum() |
- Use
.cumsum()for running totals down columns and.cumsum(axis=1)for horizontal accumulation across columns. - For category-specific running totals, always combine with
groupby()to keep each group's accumulation independent. - Set
skipna=Falseonly when missing data should invalidate all subsequent values in the running total.