Skip to main content

How to Find the Median of a List in Python

The median represents the "middle" value in a dataset and provides a robust measure of central tendency that is less affected by outliers than the mean. For example, in the dataset [1, 2, 3, 4, 1000], the mean is 202 but the median is 3, which better represents the typical value.

In this guide, you will learn how to calculate the median using Python's standard library, a manual implementation, and NumPy for large datasets. Each approach is explained with clear examples and output so you can choose the right one for your situation.

Using the statistics Module

The statistics module from Python's standard library provides the most straightforward solution. It handles sorting automatically and correctly manages both odd-length and even-length lists:

import statistics

# Odd-length list: the middle value is returned directly
data = [10, 2, 38, 23, 38]
print(statistics.median(data))

# Even-length list: the average of the two middle values is returned
even_data = [10, 2, 38, 23]
print(statistics.median(even_data))

Output:

23
16.5

For the odd-length list [10, 2, 38, 23, 38], sorting gives [2, 10, 23, 38, 38], and the middle element is 23. For the even-length list [10, 2, 38, 23], sorting gives [2, 10, 23, 38], and the median is the average of the two middle values: (10 + 23) / 2 = 16.5.

median_low() and median_high()

The statistics module also provides median_low() and median_high() for even-length lists when you need the actual lower or upper middle value instead of their average:

import statistics

data = [10, 2, 38, 23]

print(statistics.median_low(data)) # 10 (lower middle value)
print(statistics.median_high(data)) # 23 (upper middle value)
print(statistics.median(data)) # 16.5 (average of both)

These are useful when your data consists of discrete values where an average would not be meaningful, such as letter grades or categories.

Manual Calculation

Understanding the underlying algorithm is valuable for interviews, educational purposes, or environments where external libraries are not available:

def median(nums):
if not nums:
raise ValueError("Cannot compute median of empty list")

sorted_nums = sorted(nums)
mid = len(sorted_nums) // 2

if len(sorted_nums) % 2 == 0:
return (sorted_nums[mid - 1] + sorted_nums[mid]) / 2
return sorted_nums[mid]

# Odd-length list
print(median([1, 3, 5, 7, 9]))

# Even-length list
print(median([1, 3, 5, 7]))

Output:

5
4.0

How It Works Step by Step

The algorithm follows three steps:

  1. Sort the list to arrange values in ascending order.
  2. Find the middle index using integer division (len // 2).
  3. Check if the length is even or odd:
    • Odd: return the element at the middle index directly.
    • Even: return the average of the two middle elements.

Here is a visual breakdown for both cases:

# Odd-length list: [10, 2, 38, 23, 38]
# Sorted: [2, 10, 23, 38, 38]
# ^
# Length: 5, Mid index: 2
# Result: sorted_nums[2] = 23

# Even-length list: [10, 2, 38, 23]
# Sorted: [2, 10, 23, 38]
# ^ ^
# Length: 4, Mid index: 2
# Result: (sorted_nums[1] + sorted_nums[2]) / 2 = (10 + 23) / 2 = 16.5

Using NumPy for Large Datasets

When working with millions of values, NumPy provides optimized performance through vectorized operations written in C:

import numpy as np

data = [10, 2, 38, 23, 38]
print(np.median(data))

even_data = [10, 2, 38, 23]
print(np.median(even_data))

Output:

23.0
16.5
note

NumPy always returns a float, even when the median is a whole number (23.0 instead of 23). Keep this in mind if you need integer results or are comparing values with strict type checking.

NumPy's advantage becomes clear with large datasets:

import numpy as np

# Generate a large dataset
large_data = np.random.randint(0, 1000, size=1_000_000)
result = np.median(large_data)
print(f"Median of 1,000,000 values: {result}")

Example output:

Median of 1,000,000 values: 499.5

Handling Edge Cases

A robust median implementation should account for special scenarios:

import statistics

# Single element
print(statistics.median([42]))

# Two elements
print(statistics.median([10, 20]))

# Negative numbers
print(statistics.median([-5, -1, 0, 3, 10]))

# Floating-point numbers
print(statistics.median([1.5, 2.7, 3.2]))

Output:

42
15.0
0
2.7

An empty list raises an exception, which is the correct behavior since the median of no data is undefined:

import statistics

statistics.median([])

Output:

statistics.StatisticsError: no median for empty data

Why the Median Resists Outliers

The median is preferred over the mean when your data may contain extreme values:

import statistics

salaries = [35000, 40000, 42000, 45000, 50000, 5000000]

print(f"Mean: {statistics.mean(salaries):,.0f}")
print(f"Median: {statistics.median(salaries):,.0f}")

Output:

Mean:   868,667
Median: 43,500

The single salary of 5,000,000 pulls the mean up to 868,667, which does not represent any typical salary in the dataset. The median of 43,500 is a much more representative measure of the central tendency.

Performance Comparison

For large datasets, the choice of method affects execution speed:

import timeit
import statistics
import numpy as np

data = list(range(10000))

stats_time = timeit.timeit(lambda: statistics.median(data), number=1000)
np_time = timeit.timeit(lambda: np.median(data), number=1000)

print(f"statistics.median: {stats_time:.4f}s")
print(f"np.median: {np_time:.4f}s")

Example output (times vary by system):

statistics.median: 2.1543s
np.median: 0.4821s

NumPy is significantly faster for large datasets because its sorting and computation are implemented in optimized C code.

Method Comparison

MethodAuto-SortsReturn TypeBest For
statistics.median()Yesint or floatGeneral-purpose, standard library
Manual calculationNo (you sort)int or floatLearning, interviews, no dependencies
np.median()Yesfloat (always)Large datasets, scientific computing

Conclusion

  • For most applications, statistics.median() offers the best balance of simplicity, readability, and reliability. It is part of the standard library, handles all edge cases correctly, and requires no installation.
  • Switch to NumPy when processing large datasets where performance becomes critical, keeping in mind that it always returns a float.

Understanding the manual approach is valuable for interviews and for situations where you cannot use external libraries or need to customize the algorithm for specific requirements.