Skip to main content

Python Pandas: How to Calculate Absolute Frequency and Relative Frequency in Python Pandas

Understanding value distributions is the first step in data exploration. How many users are "Active"? What percentage of transactions failed? Pandas provides elegant solutions for both absolute counts and relative percentages.

Core Concepts

  • Absolute Frequency: Raw counts (e.g., "50 users")
  • Relative Frequency: Proportions or percentages (e.g., "25% of users")

Pandas handles both with the versatile .value_counts() method.

Absolute Frequency

Count distinct values by calling .value_counts() on a Series:

import pandas as pd

data = ['Pass', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass']
df = pd.DataFrame(data, columns=['Status'])

counts = df['Status'].value_counts()
print(counts)

Output:

Pass    4
Fail 2
Name: Status, dtype: int64

Sorting Options

import pandas as pd

df = pd.DataFrame({'Grade': ['B', 'A', 'C', 'A', 'B', 'A', 'B', 'C']})

# Default: sorted by count (descending)
print(df['Grade'].value_counts())
# A 3
# B 3
# C 2

# Sort alphabetically by index
print(df['Grade'].value_counts().sort_index())
# A 3
# B 3
# C 2

# Ascending count order
print(df['Grade'].value_counts(ascending=True))
# C 2
# A 3
# B 3

Output:

Grade
B 3
A 3
C 2
Name: count, dtype: int64
Grade
A 3
B 3
C 2
Name: count, dtype: int64
Grade
C 2
B 3
A 3
Name: count, dtype: int64

Relative Frequency (Percentages)

Use normalize=True to get proportions instead of raw counts:

import pandas as pd

data = ['Pass', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass']
df = pd.DataFrame(data, columns=['Status'])

percentages = df['Status'].value_counts(normalize=True)
print(percentages)

Output:

Status
Pass 0.666667
Fail 0.333333
Name: proportion, dtype: float64

Formatting as Readable Percentages

import pandas as pd

df = pd.DataFrame({'Status': ['Pass', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass']})

# Multiply by 100
pct = df['Status'].value_counts(normalize=True) * 100
print(pct)

# Format with % symbol
formatted = df['Status'].value_counts(normalize=True).apply(lambda x: f"{x:.1%}")
print(formatted)

Output:

Status
Pass 66.666667
Fail 33.333333
Name: proportion, dtype: float64
Status
Pass 66.7%
Fail 33.3%
Name: proportion, dtype: object
tip

Use normalize=True to quickly identify dominant categories. A value appearing in 95% of rows behaves differently than one appearing in 5%.

Handling Missing Values

By default, value_counts() ignores NaN values. Include them with dropna=False:

import pandas as pd

data = ['A', 'B', None, 'A', None, 'B', 'A']
s = pd.Series(data)

# Default: excludes NaN
print(s.value_counts())
print()

# Include NaN counts
print(s.value_counts(dropna=False))

Output:

A    3
B 2

Name: count, dtype: int64
A 3
B 2
None 2
Name: count, dtype: int64

Percentage Including Missing

import pandas as pd
import numpy as np

df = pd.DataFrame({'Category': ['X', 'Y', np.nan, 'X', np.nan]})

# What percentage of data is missing?
pct_with_na = df['Category'].value_counts(normalize=True, dropna=False)
print(pct_with_na)

Output:

Category
X 0.4
NaN 0.4
Y 0.2
Name: proportion, dtype: float64

Binning Numeric Data

For continuous data, use bins to create frequency distributions:

import pandas as pd

ages = pd.Series([22, 35, 45, 23, 38, 52, 28, 41, 55, 33])

# Create age groups
age_counts = ages.value_counts(bins=3, sort=False)
print(age_counts)

Output:

(21.965999999999998, 33.0]    4
(33.0, 44.0] 3
(44.0, 55.0] 3
Name: count, dtype: int64

Custom Bins

import pandas as pd

scores = pd.Series([45, 67, 82, 91, 73, 55, 88, 76, 62, 95])

# Define grade boundaries
bins = [0, 60, 70, 80, 90, 100]
labels = ['F', 'D', 'C', 'B', 'A']

grades = pd.cut(scores, bins=bins, labels=labels)
print(grades.value_counts().sort_index())

Output:

F    2
D 2
C 2
B 2
A 2
Name: count, dtype: int64

Both Counts and Percentages Together

Create a summary DataFrame with both metrics:

import pandas as pd

df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'B', 'A', 'A', 'C', 'B', 'A']
})

counts = df['Product'].value_counts()
percentages = df['Product'].value_counts(normalize=True) * 100

summary = pd.DataFrame({
'Count': counts,
'Percentage': percentages.round(1)
})
print(summary)

Output:

         Count  Percentage
Product
A 5 50.0
B 3 30.0
C 2 20.0

Cross-Tabulation for Multiple Columns

Compare frequencies across two categorical variables:

import pandas as pd

df = pd.DataFrame({
'Gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'M'],
'Status': ['Pass', 'Pass', 'Fail', 'Pass', 'Pass', 'Fail', 'Pass', 'Fail']
})

# Count combinations
print(pd.crosstab(df['Gender'], df['Status']))
print()

# With percentages (normalize by row)
print(pd.crosstab(df['Gender'], df['Status'], normalize='index').round(2))

Output:

Status  Fail  Pass
Gender
F 1 2
M 2 3

Status Fail Pass
Gender
F 0.33 0.67
M 0.40 0.60

Practical Example: Survey Analysis

import pandas as pd

# Survey responses
survey = pd.DataFrame({
'Satisfaction': ['Very Happy', 'Happy', 'Neutral', 'Happy',
'Unhappy', 'Very Happy', 'Happy', 'Neutral',
'Happy', 'Very Happy']
})

# Order matters for satisfaction levels
order = ['Very Happy', 'Happy', 'Neutral', 'Unhappy', 'Very Unhappy']

counts = survey['Satisfaction'].value_counts()
# Reindex to enforce order
counts = counts.reindex(order).fillna(0).astype(int)

print("Survey Results:")
print(counts)

# Cumulative percentage
cumulative = (counts.cumsum() / counts.sum() * 100).round(1)
print(f"\n{cumulative['Happy']}% are Happy or better")

Output:

Survey Results:
Satisfaction
Very Happy 3
Happy 4
Neutral 2
Unhappy 1
Very Unhappy 0
Name: count, dtype: int64

70.0% are Happy or better

Quick Reference

GoalCode
Count valuesdf['col'].value_counts()
Get percentagesdf['col'].value_counts(normalize=True)
Include missingdf['col'].value_counts(dropna=False)
Sort by indexdf['col'].value_counts().sort_index()
Ascending orderdf['col'].value_counts(ascending=True)
Bin numeric datadf['col'].value_counts(bins=5)
Cross-tabulationpd.crosstab(df['col1'], df['col2'])

Summary

  • Use value_counts() for absolute frequencies and add normalize=True for relative frequencies.
  • Remember to set dropna=False when missing values are analytically important.
  • For numeric data, the bins parameter creates histogram-style frequency distributions without needing to pre-categorize your data.