Python Pandas: How to Count Unique Value Frequencies in Pandas
Understanding how values are distributed across a column is one of the first steps in data exploration. Before diving into modeling or detailed analysis, you need to know which categories dominate your dataset, how balanced the distribution is, and whether any values are unexpectedly rare or frequent. The value_counts() method provides fast, sorted frequency analysis with several options for customizing the output.
In this guide, you will learn how to count value frequencies, calculate proportions, handle missing values, bin numeric data into ranges, and add frequency counts back to your DataFrame as a feature column.
Basic Frequency Count with value_counts()
The value_counts() method counts how many times each unique value appears and returns the results sorted in descending order by default:
import pandas as pd
cities = pd.Series(['NY', 'LA', 'NY', 'CHI', 'LA', 'LA', 'NY', 'NY'])
print(cities.value_counts())
Output:
NY 4
LA 3
CHI 1
Name: count, dtype: int64
NY appears 4 times, LA appears 3 times, and CHI appears once. The most frequent value is listed first, giving you an immediate sense of the distribution.
Getting Proportions and Percentages
Add normalize=True to see what fraction of the total each value represents:
import pandas as pd
cities = pd.Series(['NY', 'LA', 'NY', 'CHI', 'LA', 'LA'])
# Proportions (0 to 1)
print(cities.value_counts(normalize=True))
print()
# As percentages
print((cities.value_counts(normalize=True) * 100).round(1))
Output:
LA 0.500000
NY 0.333333
CHI 0.166667
Name: proportion, dtype: float64
LA 50.0
NY 33.3
CHI 16.7
Name: proportion, dtype: float64
LA accounts for 50% of all entries, NY for 33.3%, and CHI for 16.7%. Proportions are especially useful when comparing distributions across datasets of different sizes.
Including Missing Values
By default, value_counts() excludes NaN values from the count. Use dropna=False to include them:
import pandas as pd
import numpy as np
data = pd.Series(['A', 'B', np.nan, 'A', np.nan])
print("Excluding NaN (default):")
print(data.value_counts())
print()
print("Including NaN:")
print(data.value_counts(dropna=False))
Output:
Excluding NaN (default):
A 2
B 1
Name: count, dtype: int64
Including NaN:
A 2
NaN 2
B 1
Name: count, dtype: int64
With dropna=False, you can see that NaN appears twice, which is valuable information for data quality assessment. Without it, you might not realize that 40% of the data is missing.
Binning Numeric Data into Ranges
For continuous numeric values, raw frequency counts are often not useful since most values may be unique. The bins parameter groups values into ranges and counts how many fall into each:
import pandas as pd
prices = pd.Series([10, 15, 20, 25, 40, 50, 55, 100])
# Group into 3 equal-width bins
print(prices.value_counts(bins=3))
Output:
(9.909, 40.0] 5
(40.0, 70.0] 2
(70.0, 100.0] 1
Name: count, dtype: int64
Five prices fall in the lowest range (roughly 10 to 40), two in the middle range (40 to 70), and one in the highest range (70 to 100). This is a quick way to understand the shape of a numeric distribution without creating a full histogram.
For more control over bin edges and labels, use pd.cut() separately and then apply value_counts() on the result. The bins parameter in value_counts() creates equal-width bins automatically, which may not always match your analytical needs.
Controlling Sort Order
import pandas as pd
data = pd.Series(['B', 'A', 'C', 'A', 'B', 'A'])
# Default: sorted by count, most frequent first
print("By count (descending):")
print(data.value_counts())
print()
# Sorted alphabetically by value
print("By value (alphabetical):")
print(data.value_counts().sort_index())
print()
# Sorted by count, least frequent first
print("By count (ascending):")
print(data.value_counts(ascending=True))
Output:
By count (descending):
A 3
B 2
C 1
Name: count, dtype: int64
By value (alphabetical):
A 3
B 2
C 1
Name: count, dtype: int64
By count (ascending):
C 1
B 2
A 3
Name: count, dtype: int64
Selecting the Top N Most Frequent Values
When a column has many unique values, you often only care about the most common ones:
import pandas as pd
data = pd.Series(['A', 'B', 'C', 'A', 'B', 'A', 'D', 'E'])
# Top 3 most frequent
print(data.value_counts().head(3))
Output:
A 3
B 2
C 1
Name: count, dtype: int64
Adding Frequency as a DataFrame Column
Sometimes you need the frequency count for each row's value as a new column in the DataFrame. Use groupby with transform('count') to map the group-level count back to each individual row:
import pandas as pd
df = pd.DataFrame({
'City': ['NY', 'LA', 'NY', 'CHI', 'LA', 'LA'],
'Sales': [100, 200, 150, 300, 250, 175]
})
df['City_Freq'] = df.groupby('City')['City'].transform('count')
print(df)
Output:
City Sales City_Freq
0 NY 100 2
1 LA 200 3
2 NY 150 2
3 CHI 300 1
4 LA 250 3
5 LA 175 3
Each row now shows how many times its city appears in the dataset. This is useful for feature engineering in machine learning or for identifying rare categories that might need special handling.
Building a Complete Frequency Table
Combine counts and percentages into a single summary table:
import pandas as pd
data = pd.Series(['A', 'A', 'A', 'B', 'B', 'C'], name='Category')
counts = data.value_counts()
pct = data.value_counts(normalize=True) * 100
freq_table = pd.DataFrame({
'Count': counts,
'Percent': pct.round(1),
'Cumulative': pct.cumsum().round(1)
})
print(freq_table)
Output:
Count Percent Cumulative
Category
A 3 50.0 50.0
B 2 33.3 83.3
C 1 16.7 100.0
The cumulative column shows that A and B together account for 83.3% of all values.
Quick Reference
| Goal | Method |
|---|---|
| Raw counts | s.value_counts() |
| Proportions | s.value_counts(normalize=True) |
| Percentages | s.value_counts(normalize=True) * 100 |
| Include NaN | s.value_counts(dropna=False) |
| Numeric bins | s.value_counts(bins=N) |
| Sort by value | s.value_counts().sort_index() |
| Top N values | s.value_counts().head(N) |
| Row-level frequency | df.groupby('col')['col'].transform('count') |
- Use
value_counts()for quick frequency analysis. It handles sorting and counting efficiently out of the box. - Add
normalize=Truefor proportions,dropna=Falseto include missing values, andbinsto categorize continuous numeric data. - Use
transform('count')when you need to add frequency counts back as a column in your DataFrame.