Skip to main content

Python Pandas: How to Calculate Weighted Average in Pandas

A weighted average assigns different levels of importance (weights) to each value before computing the average. Unlike a simple average where all values contribute equally, a weighted average gives more influence to values with higher weights.

Formula:

Weighted Average = Σ(value × weight) / Σ(weight)

Weighted averages are essential in finance (portfolio returns), education (grade calculation), surveys (population-weighted responses), and data analysis (importance-adjusted metrics).

Basic Weighted Average Calculation

The simplest approach is to implement the formula directly using Pandas operations:

import pandas as pd

df = pd.DataFrame({
'item': ['A', 'B', 'C', 'D'],
'price': [100, 200, 150, 300],
'quantity': [10, 5, 8, 3]
})

# Weighted average price (weighted by quantity)
weighted_avg = (df['price'] * df['quantity']).sum() / df['quantity'].sum()

print(f"Simple average: {df['price'].mean():.2f}")
print(f"Weighted average: {weighted_avg:.2f}")

Output:

Simple average: 187.50
Weighted average: 157.69

The weighted average (157.69) is lower than the simple average (187.50) because the cheaper items (A and C) have higher quantities (weights), pulling the average down.

Creating a Reusable Function

Wrap the calculation in a function for reusability:

import pandas as pd

def weighted_average(df, value_col, weight_col):
"""Calculate the weighted average of a column."""
values = df[value_col]
weights = df[weight_col]
return (values * weights).sum() / weights.sum()

df = pd.DataFrame({
'item': ['A', 'B', 'C', 'D'],
'price': [100, 200, 150, 300],
'quantity': [10, 5, 8, 3]
})

result = weighted_average(df, 'price', 'quantity')
print(f"Weighted average price: {result:.2f}")

Output:

Weighted average price: 157.69

Weighted Average by Group

A common use case is computing the weighted average for each group separately. Use groupby() with .apply():

import pandas as pd

def weighted_average(group, value_col, weight_col):
values = group[value_col]
weights = group[weight_col]
return (values * weights).sum() / weights.sum()

df = pd.DataFrame({
'store': ['North', 'North', 'North', 'South', 'South', 'South'],
'product': ['Chocolate', 'Biscuit', 'IceCream',
'Chocolate', 'Biscuit', 'IceCream'],
'price': [90, 42, 68, 86, 48, 102],
'quantity': [4, 6, 3, 3, 5, 5]
})

# Weighted average price per store
result = df.groupby('store').apply(
weighted_average, 'price', 'quantity'
)

print(result)

Output:

store
North 62.769231
South 77.538462
dtype: float64

Weighted Average by Product

import pandas as pd

def weighted_average(group, value_col, weight_col):
values = group[value_col]
weights = group[weight_col]
return (values * weights).sum() / weights.sum()

df = pd.DataFrame({
'store': ['North', 'North', 'North', 'South', 'South', 'South'],
'product': ['Chocolate', 'Biscuit', 'IceCream',
'Chocolate', 'Biscuit', 'IceCream'],
'price': [90, 42, 68, 86, 48, 102],
'quantity': [4, 6, 3, 3, 5, 5]
})

# Weighted average price per product (weighted by quantity)
result = df.groupby('product').apply(
weighted_average, 'price', 'quantity'
)

print(result)

Output:

product
Biscuit 44.727273
Chocolate 88.285714
IceCream 89.250000
dtype: float64

Using numpy.average() with Pandas

NumPy's average() function has a built-in weights parameter that makes this even simpler:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'item': ['A', 'B', 'C', 'D'],
'price': [100, 200, 150, 300],
'quantity': [10, 5, 8, 3]
})

# Single weighted average
result = np.average(df['price'], weights=df['quantity'])
print(f"Weighted average: {result:.2f}")

Output:

Weighted average: 157.69

With groupby()

import pandas as pd
import numpy as np

df = pd.DataFrame({
'store': ['North', 'North', 'South', 'South'],
'price': [100, 150, 200, 120],
'quantity': [10, 5, 3, 8]
})

result = df.groupby('store').apply(
lambda g: np.average(g['price'], weights=g['quantity'])
)

print(result)

Output:

store
North 116.666667
South 141.818182
dtype: float64

Alternative: Using groupby() Without apply()

For better performance on large DataFrames, avoid apply() and use vectorized operations:

import pandas as pd

df = pd.DataFrame({
'product': ['Chocolate', 'Chocolate', 'Chocolate',
'Biscuit', 'Biscuit', 'Biscuit'],
'price': [90, 50, 86, 87, 42, 48],
'weight': [4, 2, 3, 5, 6, 5]
})

# Compute weighted sum and total weight per group
df['weighted_value'] = df['price'] * df['weight']

result = (
df.groupby('product')['weighted_value'].sum()
/ df.groupby('product')['weight'].sum()
)

print(result)

Output:

product
Biscuit 57.937500
Chocolate 79.777778
dtype: float64
Performance tip

This vectorized approach is significantly faster than using apply() for large DataFrames because it avoids Python-level function calls for each group.

Practical Example: Student Grade Calculation

A classic weighted average use case is calculating a student's final grade where different assignments carry different weights:

import pandas as pd
import numpy as np

grades = pd.DataFrame({
'student': ['Alice', 'Alice', 'Alice', 'Bob', 'Bob', 'Bob'],
'category': ['Homework', 'Midterm', 'Final',
'Homework', 'Midterm', 'Final'],
'score': [92, 85, 78, 75, 88, 95],
'weight': [20, 30, 50, 20, 30, 50]
})

# Calculate weighted average grade per student
result = grades.groupby('student').apply(
lambda g: np.average(g['score'], weights=g['weight'])
)

print("Weighted Final Grades:")
print(result)

# Compare with simple average
simple_avg = grades.groupby('student')['score'].mean()
print("\nSimple Average Grades:")
print(simple_avg)

Output:

Weighted Final Grades:
student
Alice 82.9
Bob 88.9
dtype: float64

Simple Average Grades:
student
Alice 85.0
Bob 86.0
Name: score, dtype: float64

Alice's weighted grade (83.9) is lower than her simple average (85.0) because her weakest score was on the Final, which carries the highest weight (50%). Bob's weighted grade is higher because he performed best on the Final.

Practical Example: Portfolio Weighted Return

import pandas as pd
import numpy as np

portfolio = pd.DataFrame({
'stock': ['AAPL', 'GOOGL', 'MSFT', 'AMZN'],
'return_pct': [12.5, 8.3, 15.2, -3.1],
'investment': [50000, 30000, 40000, 20000]
})

# Portfolio weighted return
weighted_return = np.average(
portfolio['return_pct'],
weights=portfolio['investment']
)

simple_return = portfolio['return_pct'].mean()

print(f"Portfolio weighted return: {weighted_return:.2f}%")
print(f"Simple average return: {simple_return:.2f}%")

Output:

Portfolio weighted return: 10.14%
Simple average return: 8.22%

Handling Edge Cases

Watch out for zero weights

If all weights in a group sum to zero, you'll get a division-by-zero error:

import pandas as pd

df = pd.DataFrame({
'value': [10, 20],
'weight': [0, 0] # Both weights are zero!
})

# This will cause a ZeroDivisionError or return NaN
result = (df['value'] * df['weight']).sum() / df['weight'].sum()
# RuntimeWarning: invalid value encountered in scalar divide

Fix: add a check

def safe_weighted_average(df, value_col, weight_col):
weights_sum = df[weight_col].sum()
if weights_sum == 0:
return float('nan')
return (df[value_col] * df[weight_col]).sum() / weights_sum

Comparison of Methods

MethodGrouped SupportPerformanceReadability
Manual formulaWith apply()⭐⭐⭐⭐⭐⭐⭐⭐
numpy.average()With apply()⭐⭐⭐⭐⭐⭐⭐⭐
Vectorized groupby()Built-in⭐⭐⭐⭐⭐⭐⭐⭐

Conclusion

Calculating weighted averages in Pandas is straightforward using the formula (values × weights).sum() / weights.sum().

  • For single calculations, numpy.average() with the weights parameter is the cleanest approach.
  • For grouped weighted averages, combine groupby() with either apply() for clarity or vectorized operations for performance.
  • Always handle the edge case of zero total weights to prevent division errors.
  • Weighted averages give you a more accurate picture of your data by accounting for the relative importance of each observation.