Python Pandas: How to Calculate Cumulative Percentage in Pandas

Cumulative percentages show the running total of a value expressed as a proportion of the overall total. They are essential for Pareto analysis (the 80/20 rule), understanding distribution patterns, identifying top contributors, and tracking progressive totals across sorted data. In fields ranging from sales analytics to inventory management, cumulative percentages help answer questions like "which products account for 80% of our revenue?"

In this guide, you will learn how to calculate cumulative percentages in Pandas, perform Pareto analysis to identify top contributors, and compute grouped cumulative percentages within categories.

The Standard Pattern

The formula for cumulative percentage is straightforward: (Cumulative Sum / Total Sum) x 100. Pandas makes this easy with the .cumsum() and .sum() methods:

import pandas as pd

df = pd.DataFrame({
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Sales': [500, 300, 100, 50, 50]
})

df['Cum_Sum'] = df['Sales'].cumsum()
df['Cum_Percent'] = (df['Sales'].cumsum() / df['Sales'].sum()) * 100

print(df)

Output:

  Product  Sales  Cum_Sum  Cum_Percent
     A    500      500         50.0
     B    300      800         80.0
     C    100      900         90.0
     D     50      950         95.0
     E     50     1000        100.0

Each row shows what proportion of the total has been accumulated up to and including that row. By the last row, the cumulative percentage always reaches 100%.

Pareto Analysis (80/20 Rule)

Pareto analysis identifies the small number of items that contribute the most to a total. The key requirement is to sort the data in descending order before calculating cumulative percentages. Without sorting, the cumulative values will not reveal which items are the top contributors.

import pandas as pd

df = pd.DataFrame({
    'Product': ['C', 'A', 'E', 'B', 'D'],
    'Sales': [100, 500, 50, 300, 50]
})

# Step 1: Sort by sales descending
df = df.sort_values('Sales', ascending=False).reset_index(drop=True)

# Step 2: Calculate cumulative percentage
df['Cum_Percent'] = (df['Sales'].cumsum() / df['Sales'].sum()) * 100

print(df)

Output:

  Product  Sales  Cum_Percent
     A    500         50.0
     B    300         80.0
     C    100         90.0
     E     50         95.0
     D     50        100.0

Products A and B together account for 80% of total sales, illustrating the classic 80/20 pattern where a small subset of items drives the majority of results.

warning

Forgetting to sort before calculating cumulative percentages is a common mistake. Without descending sort, the cumulative values will not correctly identify which items are the biggest contributors, making Pareto analysis meaningless.

Finding the 80% Threshold Programmatically

Rather than reading the table manually, you can filter the DataFrame to extract exactly which items fall within the 80% threshold:

import pandas as pd

df = pd.DataFrame({
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Sales': [500, 300, 100, 50, 50]
})

df = df.sort_values('Sales', ascending=False).reset_index(drop=True)
df['Cum_Percent'] = (df['Sales'].cumsum() / df['Sales'].sum()) * 100

# Products contributing to 80% of sales
top_products = df[df['Cum_Percent'] <= 80]
print(f"Top products (80% of sales): {top_products['Product'].tolist()}")
print(f"Number of products: {len(top_products)} out of {len(df)}")

Output:

Top products (80% of sales): ['A', 'B']
Number of products: 2 out of 5

Building a Complete Pareto Table

A full Pareto analysis table includes both individual percentages and cumulative percentages, giving a complete picture of each item's contribution:

import pandas as pd

df = pd.DataFrame({
    'Category': ['Electronics', 'Clothing', 'Food', 'Books', 'Other'],
    'Revenue': [50000, 30000, 15000, 3000, 2000]
})

# Sort and calculate all metrics
df = df.sort_values('Revenue', ascending=False).reset_index(drop=True)
total = df['Revenue'].sum()

df['Percent'] = (df['Revenue'] / total) * 100
df['Cum_Revenue'] = df['Revenue'].cumsum()
df['Cum_Percent'] = (df['Cum_Revenue'] / total) * 100

print(df.round(1))

Output:

      Category  Revenue  Percent  Cum_Revenue  Cum_Percent
Electronics    50000     50.0        50000         50.0
   Clothing    30000     30.0        80000         80.0
       Food    15000     15.0        95000         95.0
      Books     3000      3.0        98000         98.0
      Other     2000      2.0       100000        100.0

This format makes it easy to see that Electronics alone accounts for 50% of revenue, and Electronics plus Clothing together reach the 80% mark.

Grouped Cumulative Percentage

When your data spans multiple categories or regions, you often need cumulative percentages calculated within each group rather than across the entire dataset. Use groupby() combined with a cumulative sum:

import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'East', 'West', 'West'],
    'Product': ['A', 'B', 'C', 'A', 'B'],
    'Sales': [100, 200, 50, 150, 100]
})

# Sort within each group by sales descending
df = df.sort_values(['Region', 'Sales'], ascending=[True, False])

# Calculate cumulative percentage within each region
df['Cum_Pct'] = df.groupby('Region')['Sales'].apply(
    lambda x: (x.cumsum() / x.sum()) * 100
).values

print(df.round(1))

Output:

  Region Product  Sales  Cum_Pct
 East       B    200     57.1
 East       A    100     85.7
 East       C     50    100.0
 West       A    150     60.0
 West       B    100    100.0

Each region's cumulative percentages independently reach 100%, giving you a Pareto breakdown within each group.

Common Mistake: Calculating Without Sorting

To see why sorting matters, compare the results with and without descending sort:

import pandas as pd

df = pd.DataFrame({
    'Product': ['C', 'A', 'E', 'B', 'D'],
    'Sales': [100, 500, 50, 300, 50]
})

# Without sorting (misleading)
df['Cum_Pct_Unsorted'] = (df['Sales'].cumsum() / df['Sales'].sum()) * 100

print("Without sorting:")
print(df[['Product', 'Sales', 'Cum_Pct_Unsorted']])

Output:

Without sorting:
  Product  Sales  Cum_Pct_Unsorted
     C    100              10.0
     A    500              60.0
     E     50              65.0
     B    300              95.0
     D     50             100.0

The unsorted version makes it appear that product C is a top contributor simply because it happens to be first in the data. After sorting, the true picture emerges with product A clearly leading.

tip

Always sort your data in descending order by the value column before computing cumulative percentages for Pareto analysis. The cumulative percentage is only meaningful for identifying top contributors when the largest values come first.

Quick Reference

Step	Method	Purpose
Sort	`.sort_values(ascending=False)`	Required for meaningful Pareto analysis
Running total	`.cumsum()`	Accumulate values progressively
Total	`.sum()`	Denominator for percentage
Normalize	`/ total * 100`	Convert to percentage
Group	`.groupby().apply()`	Cumulative percentage within categories

Calculate cumulative percentages with (df['col'].cumsum() / df['col'].sum()) * 100.

For Pareto analysis, always sort in descending order first so the largest contributors appear at the top.
Use groupby().apply() when you need independent cumulative percentages within separate categories or regions.

The Standard Pattern​

Pareto Analysis (80/20 Rule)​

Finding the 80% Threshold Programmatically​

Building a Complete Pareto Table​

Grouped Cumulative Percentage​

Common Mistake: Calculating Without Sorting​

Quick Reference​

Table of Contents

The Standard Pattern

Pareto Analysis (80/20 Rule)

Finding the 80% Threshold Programmatically

Building a Complete Pareto Table

Grouped Cumulative Percentage

Common Mistake: Calculating Without Sorting

Quick Reference