How to Count Frequency of Column Values in Pandas
Understanding how values are distributed across a column is one of the first steps in any data exploration workflow. Whether you need to know which product sells most frequently, what percentage of users are active, or how values are spread across categories, frequency counting gives you an immediate picture of your data's composition. Pandas provides several methods for this purpose, each suited to different levels of detail and complexity.
In this guide, you will learn how to count value occurrences, calculate proportions, compare frequencies across multiple columns, and handle common scenarios like missing values and rare categories.
Basic Frequency Count with value_counts()
The value_counts() method is the standard tool for counting how often each unique value appears in a column. By default, results are sorted in descending order by frequency:
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})
print(df['Fruit'].value_counts())
Output:
Fruit
Apple 3
Banana 2
Orange 1
Name: count, dtype: int64
Apple appears 3 times, Banana twice, and Orange once. The most frequent value appears first.
Getting Proportions Instead of Counts
Add normalize=True to see what fraction of the total each value represents:
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})
print(df['Fruit'].value_counts(normalize=True))
Output:
Fruit
Apple 0.500000
Banana 0.333333
Orange 0.166667
Name: proportion, dtype: float64
Apple accounts for 50% of all entries. Multiply by 100 if you prefer percentage format:
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']
})
print((df['Fruit'].value_counts(normalize=True) * 100).round(1))
Output:
Fruit
Apple 50.0
Banana 33.3
Orange 16.7
Name: proportion, dtype: float64
Including Missing Values
By default, value_counts() excludes NaN values. Use dropna=False to include them in the count:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Status': ['Active', 'Active', np.nan, 'Inactive', np.nan]
})
print("Excluding NaN (default):")
print(df['Status'].value_counts())
print()
print("Including NaN:")
print(df['Status'].value_counts(dropna=False))
Output:
Excluding NaN (default):
Status
Active 2
Inactive 1
Name: count, dtype: int64
Including NaN:
Status
Active 2
NaN 2
Inactive 1
Name: count, dtype: int64
With dropna=False, you can see that 2 out of 5 records are missing a status, which is valuable information for data quality assessment.
Controlling Sort Order
import pandas as pd
df = pd.DataFrame({'Grade': ['B', 'A', 'C', 'A', 'B', 'A']})
# Sort alphabetically by the value (not by count)
print("Sorted by value:")
print(df['Grade'].value_counts().sort_index())
print()
# Sort by count in ascending order (least common first)
print("Ascending count:")
print(df['Grade'].value_counts(ascending=True))
Output:
Sorted by value:
Grade
A 3
B 2
C 1
Name: count, dtype: int64
Ascending count:
Grade
C 1
B 2
A 3
Name: count, dtype: int64
Cross-Tabulation with pd.crosstab()
When you need to compare frequencies across two categorical columns, pd.crosstab() produces a contingency table:
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'],
'Color': ['Red', 'Yellow', 'Green', 'Orange', 'Yellow']
})
table = pd.crosstab(df['Fruit'], df['Color'])
print(table)
Output:
Color Green Orange Red Yellow
Fruit
Apple 1 0 1 0
Banana 0 0 0 2
Orange 0 1 0 0
Adding Row and Column Totals
The margins=True parameter adds summary totals:
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'],
'Color': ['Red', 'Yellow', 'Green', 'Orange', 'Yellow']
})
table = pd.crosstab(df['Fruit'], df['Color'], margins=True)
print(table)
Output:
Color Green Orange Red Yellow All
Fruit
Apple 1 0 1 0 2
Banana 0 0 0 2 2
Orange 0 1 0 0 1
All 1 1 1 2 5
The "All" row and column show the marginal totals, giving you a complete picture of the distribution.
Multi-Column Frequency with groupby().size()
When you need to count combinations of values across multiple columns, groupby().size() provides a flexible approach:
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Active']
})
counts = df.groupby(['Category', 'Status']).size()
print(counts)
Output:
Category Status
A Active 2
Inactive 1
B Active 2
dtype: int64
Converting to a Flat DataFrame
For export or further processing, use .reset_index() to flatten the result:
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Active']
})
counts_df = df.groupby(['Category', 'Status']).size().reset_index(name='Count')
print(counts_df)
Output:
Category Status Count
0 A Active 2
1 A Inactive 1
2 B Active 2
Binning Numeric Data into Frequency Groups
For continuous numeric columns, pd.cut() creates bins that you can then count:
import pandas as pd
df = pd.DataFrame({'Age': [22, 35, 45, 23, 38, 52, 28]})
df['Age_Group'] = pd.cut(
df['Age'],
bins=[0, 30, 50, 100],
labels=['Young', 'Middle', 'Senior']
)
print(df['Age_Group'].value_counts())
Output:
Age_Group
Young 3
Middle 3
Senior 1
Name: count, dtype: int64
This is useful for understanding distributions without looking at every individual value.
Practical Examples
Finding the Top N Most Common Values
import pandas as pd
df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'D', 'A']
})
# Top 3 most common products
top_3 = df['Product'].value_counts().head(3)
print(top_3)
Output:
Product
A 4
B 2
C 1
Name: count, dtype: int6
Filtering Out Rare Values
Sometimes you want to keep only categories that appear frequently enough to be meaningful:
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'A', 'A', 'B', 'B', 'C', 'D']
})
counts = df['Category'].value_counts()
# Keep only categories appearing more than once
common = counts[counts > 1].index
df_filtered = df[df['Category'].isin(common)]
print("Original categories:", df['Category'].unique().tolist())
print("After filtering:", df_filtered['Category'].unique().tolist())
Output:
Original categories: ['A', 'B', 'C', 'D']
After filtering: ['A', 'B']
Categories C and D each appear only once and are filtered out.
Building a Complete Frequency Table
import pandas as pd
df = pd.DataFrame({
'Status': ['Active', 'Active', 'Active', 'Inactive', 'Pending']
})
counts = df['Status'].value_counts()
percentages = df['Status'].value_counts(normalize=True) * 100
freq_table = pd.DataFrame({
'Count': counts,
'Percent': percentages.round(1),
'Cumulative': percentages.cumsum().round(1)
})
print(freq_table)
Output:
Count Percent Cumulative
Status
Active 3 60.0 60.0
Inactive 1 20.0 80.0
Pending 1 20.0 100.0
Quick Reference
| Method | Best For | Output |
|---|---|---|
.value_counts() | Single column frequency | Series |
.value_counts(normalize=True) | Proportions and percentages | Series |
.value_counts(dropna=False) | Including missing values | Series |
pd.crosstab() | Two-column comparison | DataFrame |
.groupby().size() | Multi-column combinations | Series |
pd.cut() + .value_counts() | Binning numeric data | Series |
- Use
value_counts()for quick single-column frequency analysis. - Add
normalize=Truefor proportions ordropna=Falseto include missing values in the count. - Use
pd.crosstab()to compare two categorical columns side by side. - Use
groupby().size()when counting multi-column combinations or when you need to chain the result with other aggregation operations.