How to Calculate the Average of a List of Floats in Python
Finding the arithmetic mean (average) is one of the most common operations in programming. Python offers several approaches depending on your needs: built-in functions for simplicity, the statistics module for correctness, and NumPy for performance.
Using Built-in sum() and len()
For standard lists, combining sum() and len() is the most straightforward native approach.
prices = [19.99, 5.50, 4.25, 10.00]
average = sum(prices) / len(prices)
print(f"Average Price: ${average:.2f}")
Output:
Average Price: $9.93
Dividing by zero raises a ZeroDivisionError. Always validate your input:
def calculate_average(numbers: list[float]) -> float | None:
if not numbers:
return None
return sum(numbers) / len(numbers)
result = calculate_average([])
print(result) # None
Using a Conditional Expression
prices = [19.99, 5.50, 4.25, 10.00]
average = sum(prices) / len(prices) if prices else 0.0
print(f"Average: {average:.2f}") # Output: Average: 9.93
Using the statistics Module
Python's standard library includes a statistics module specifically designed for statistical calculations. It handles edge cases and provides clear, readable code.
import statistics
data = [1.5, 2.5, 3.5, 4.5, 5.5]
average = statistics.mean(data)
print(f"Mean: {average}")
Output:
Mean: 3.5
Additional Statistical Functions
The statistics module offers more than just the mean:
import statistics
data = [2.5, 3.0, 3.5, 4.0, 4.5, 100.0] # Note the outlier
print(f"Mean: {statistics.mean(data):.2f}")
print(f"Median: {statistics.median(data):.2f}")
print(f"Stdev: {statistics.stdev(data):.2f}")
Output:
Mean: 19.58
Median: 3.75
Stdev: 39.40
When your data contains outliers (like 100.0 above), the median often represents the "typical" value better than the mean.
Using NumPy for Large Datasets
For lists with millions of elements, standard Python becomes slow. NumPy is written in C and optimized for numerical operations.
import numpy as np
data = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
average = np.mean(data)
print(f"Average: {average}")
Output:
Average: 3.3
Working with Large Datasets
import numpy as np
# Generate 1 million random values
large_dataset = np.random.uniform(0, 100, size=1_000_000)
average = np.mean(large_dataset)
print(f"Average of 1M values: {average:.4f}")
Output:
Average of 1M values: 49.9872
NumPy with Multi-dimensional Data
import numpy as np
# Sales data: rows = months, columns = products
sales = np.array([
[100.5, 200.3, 150.2],
[110.2, 190.5, 160.8],
[105.8, 210.1, 155.5]
])
# Average across all values
total_avg = np.mean(sales)
# Average per product (column)
product_avg = np.mean(sales, axis=0)
# Average per month (row)
monthly_avg = np.mean(sales, axis=1)
print(f"Overall average: {total_avg:.2f}")
print(f"Per product: {product_avg}")
print(f"Per month: {monthly_avg}")
Output:
Overall average: 153.77
Per product: [105.5 200.3 155.5]
Per month: [150.33333333 153.83333333 157.13333333]
Performance Comparison
import time
import statistics
import numpy as np
# Create test data
data_list = [float(i) for i in range(1_000_000)]
data_array = np.array(data_list)
def benchmark(name, func):
start = time.perf_counter()
result = func()
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.4f}s (result: {result:.2f})")
benchmark("sum/len", lambda: sum(data_list) / len(data_list))
benchmark("statistics.mean", lambda: statistics.mean(data_list))
benchmark("numpy.mean", lambda: np.mean(data_array))
Typical Output:
sum/len: 0.0080s (result: 499999.50)
statistics.mean: 0.3006s (result: 499999.50)
numpy.mean: 0.0007s (result: 499999.50)
Handling Special Cases
Filtering Before Averaging
import statistics
scores = [85.5, 90.0, None, 78.5, None, 92.0]
# Filter out None values
valid_scores = [s for s in scores if s is not None]
if valid_scores:
average = statistics.mean(valid_scores)
print(f"Average score: {average:.1f}")
Output:
Average score: 86.5
Weighted Average
def weighted_average(values: list[float], weights: list[float]) -> float:
"""Calculate weighted average."""
if len(values) != len(weights):
raise ValueError("Values and weights must have same length")
total = sum(v * w for v, w in zip(values, weights))
return total / sum(weights)
grades = [85.0, 90.0, 78.0]
weights = [0.3, 0.5, 0.2] # 30%, 50%, 20%
result = weighted_average(grades, weights)
print(f"Weighted average: {result:.1f}")
Output:
Weighted average: 86.1
Method Comparison
| Method | Speed | Dependencies | Best For |
|---|---|---|---|
sum()/len() | Fast | None | Simple scripts, small data |
statistics.mean() | Moderate | None (stdlib) | Readability, additional stats |
numpy.mean() | Ultra Fast | NumPy | Large datasets, data science |
Summary
- Use
sum()/len()for simple cases with small lists: just remember to check for empty lists. - Use
statistics.mean()when readability matters or you need additional statistical functions. - Use
numpy.mean()for large datasets or when already working in a data science context.