Skip to main content

How to Count Frequency of Column Values in Pandas

Understanding how values are distributed across a column is one of the first steps in any data exploration workflow. Whether you need to know which product sells most frequently, what percentage of users are active, or how values are spread across categories, frequency counting gives you an immediate picture of your data's composition. Pandas provides several methods for this purpose, each suited to different levels of detail and complexity.

In this guide, you will learn how to count value occurrences, calculate proportions, compare frequencies across multiple columns, and handle common scenarios like missing values and rare categories.

Basic Frequency Count with value_counts()

The value_counts() method is the standard tool for counting how often each unique value appears in a column. By default, results are sorted in descending order by frequency:

import pandas as pd

df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})

print(df['Fruit'].value_counts())

Output:

Fruit
Apple 3
Banana 2
Orange 1
Name: count, dtype: int64

Apple appears 3 times, Banana twice, and Orange once. The most frequent value appears first.

Getting Proportions Instead of Counts

Add normalize=True to see what fraction of the total each value represents:

import pandas as pd

df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})

print(df['Fruit'].value_counts(normalize=True))

Output:

Fruit
Apple 0.500000
Banana 0.333333
Orange 0.166667
Name: proportion, dtype: float64

Apple accounts for 50% of all entries. Multiply by 100 if you prefer percentage format:

import pandas as pd

df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})

print((df['Fruit'].value_counts(normalize=True) * 100).round(1))

Output:

Fruit
Apple 50.0
Banana 33.3
Orange 16.7
Name: proportion, dtype: float64

Including Missing Values

By default, value_counts() excludes NaN values. Use dropna=False to include them in the count:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Status': ['Active', 'Active', np.nan, 'Inactive', np.nan]
})

print("Excluding NaN (default):")
print(df['Status'].value_counts())
print()

print("Including NaN:")
print(df['Status'].value_counts(dropna=False))

Output:

Excluding NaN (default):
Status
Active 2
Inactive 1
Name: count, dtype: int64

Including NaN:
Status
Active 2
NaN 2
Inactive 1
Name: count, dtype: int64

With dropna=False, you can see that 2 out of 5 records are missing a status, which is valuable information for data quality assessment.

Controlling Sort Order

import pandas as pd

df = pd.DataFrame({'Grade': ['B', 'A', 'C', 'A', 'B', 'A']})

# Sort alphabetically by the value (not by count)
print("Sorted by value:")
print(df['Grade'].value_counts().sort_index())
print()

# Sort by count in ascending order (least common first)
print("Ascending count:")
print(df['Grade'].value_counts(ascending=True))

Output:

Sorted by value:
Grade
A 3
B 2
C 1
Name: count, dtype: int64

Ascending count:
Grade
C 1
B 2
A 3
Name: count, dtype: int64

Cross-Tabulation with pd.crosstab()

When you need to compare frequencies across two categorical columns, pd.crosstab() produces a contingency table:

import pandas as pd

df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'],
'Color': ['Red', 'Yellow', 'Green', 'Orange', 'Yellow']
})

table = pd.crosstab(df['Fruit'], df['Color'])

print(table)

Output:

Color   Green  Orange  Red  Yellow
Fruit
Apple 1 0 1 0
Banana 0 0 0 2
Orange 0 1 0 0

Adding Row and Column Totals

The margins=True parameter adds summary totals:

import pandas as pd

df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'],
'Color': ['Red', 'Yellow', 'Green', 'Orange', 'Yellow']
})

table = pd.crosstab(df['Fruit'], df['Color'], margins=True)

print(table)

Output:

Color   Green  Orange  Red  Yellow  All
Fruit
Apple 1 0 1 0 2
Banana 0 0 0 2 2
Orange 0 1 0 0 1
All 1 1 1 2 5

The "All" row and column show the marginal totals, giving you a complete picture of the distribution.

Multi-Column Frequency with groupby().size()

When you need to count combinations of values across multiple columns, groupby().size() provides a flexible approach:

import pandas as pd

df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Active']
})

counts = df.groupby(['Category', 'Status']).size()

print(counts)

Output:

Category  Status  
A Active 2
Inactive 1
B Active 2
dtype: int64

Converting to a Flat DataFrame

For export or further processing, use .reset_index() to flatten the result:

import pandas as pd

df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Active']
})

counts_df = df.groupby(['Category', 'Status']).size().reset_index(name='Count')

print(counts_df)

Output:

  Category    Status  Count
0 A Active 2
1 A Inactive 1
2 B Active 2

Binning Numeric Data into Frequency Groups

For continuous numeric columns, pd.cut() creates bins that you can then count:

import pandas as pd

df = pd.DataFrame({'Age': [22, 35, 45, 23, 38, 52, 28]})

df['Age_Group'] = pd.cut(
df['Age'],
bins=[0, 30, 50, 100],
labels=['Young', 'Middle', 'Senior']
)

print(df['Age_Group'].value_counts())

Output:

Age_Group
Young 3
Middle 3
Senior 1
Name: count, dtype: int64

This is useful for understanding distributions without looking at every individual value.

Practical Examples

Finding the Top N Most Common Values

import pandas as pd

df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'D', 'A']
})

# Top 3 most common products
top_3 = df['Product'].value_counts().head(3)

print(top_3)

Output:

Product
A 4
B 2
C 1
Name: count, dtype: int6

Filtering Out Rare Values

Sometimes you want to keep only categories that appear frequently enough to be meaningful:

import pandas as pd

df = pd.DataFrame({
'Category': ['A', 'A', 'A', 'B', 'B', 'C', 'D']
})

counts = df['Category'].value_counts()

# Keep only categories appearing more than once
common = counts[counts > 1].index
df_filtered = df[df['Category'].isin(common)]

print("Original categories:", df['Category'].unique().tolist())
print("After filtering:", df_filtered['Category'].unique().tolist())

Output:

Original categories: ['A', 'B', 'C', 'D']
After filtering: ['A', 'B']

Categories C and D each appear only once and are filtered out.

Building a Complete Frequency Table

import pandas as pd

df = pd.DataFrame({
'Status': ['Active', 'Active', 'Active', 'Inactive', 'Pending']
})

counts = df['Status'].value_counts()
percentages = df['Status'].value_counts(normalize=True) * 100

freq_table = pd.DataFrame({
'Count': counts,
'Percent': percentages.round(1),
'Cumulative': percentages.cumsum().round(1)
})

print(freq_table)

Output:

          Count  Percent  Cumulative
Status
Active 3 60.0 60.0
Inactive 1 20.0 80.0
Pending 1 20.0 100.0

Quick Reference

MethodBest ForOutput
.value_counts()Single column frequencySeries
.value_counts(normalize=True)Proportions and percentagesSeries
.value_counts(dropna=False)Including missing valuesSeries
pd.crosstab()Two-column comparisonDataFrame
.groupby().size()Multi-column combinationsSeries
pd.cut() + .value_counts()Binning numeric dataSeries
  • Use value_counts() for quick single-column frequency analysis.
  • Add normalize=True for proportions or dropna=False to include missing values in the count.
  • Use pd.crosstab() to compare two categorical columns side by side.
  • Use groupby().size() when counting multi-column combinations or when you need to chain the result with other aggregation operations.