Skip to main content

Python Pandas: How to Count Unique Value Frequencies in Pandas

Understanding how values are distributed across a column is one of the first steps in data exploration. Before diving into modeling or detailed analysis, you need to know which categories dominate your dataset, how balanced the distribution is, and whether any values are unexpectedly rare or frequent. The value_counts() method provides fast, sorted frequency analysis with several options for customizing the output.

In this guide, you will learn how to count value frequencies, calculate proportions, handle missing values, bin numeric data into ranges, and add frequency counts back to your DataFrame as a feature column.

Basic Frequency Count with value_counts()

The value_counts() method counts how many times each unique value appears and returns the results sorted in descending order by default:

import pandas as pd

cities = pd.Series(['NY', 'LA', 'NY', 'CHI', 'LA', 'LA', 'NY', 'NY'])

print(cities.value_counts())

Output:

NY     4
LA 3
CHI 1
Name: count, dtype: int64

NY appears 4 times, LA appears 3 times, and CHI appears once. The most frequent value is listed first, giving you an immediate sense of the distribution.

Getting Proportions and Percentages

Add normalize=True to see what fraction of the total each value represents:

import pandas as pd

cities = pd.Series(['NY', 'LA', 'NY', 'CHI', 'LA', 'LA'])

# Proportions (0 to 1)
print(cities.value_counts(normalize=True))
print()

# As percentages
print((cities.value_counts(normalize=True) * 100).round(1))

Output:

LA     0.500000
NY 0.333333
CHI 0.166667
Name: proportion, dtype: float64

LA 50.0
NY 33.3
CHI 16.7
Name: proportion, dtype: float64

LA accounts for 50% of all entries, NY for 33.3%, and CHI for 16.7%. Proportions are especially useful when comparing distributions across datasets of different sizes.

Including Missing Values

By default, value_counts() excludes NaN values from the count. Use dropna=False to include them:

import pandas as pd
import numpy as np

data = pd.Series(['A', 'B', np.nan, 'A', np.nan])

print("Excluding NaN (default):")
print(data.value_counts())
print()

print("Including NaN:")
print(data.value_counts(dropna=False))

Output:

Excluding NaN (default):
A 2
B 1
Name: count, dtype: int64

Including NaN:
A 2
NaN 2
B 1
Name: count, dtype: int64

With dropna=False, you can see that NaN appears twice, which is valuable information for data quality assessment. Without it, you might not realize that 40% of the data is missing.

Binning Numeric Data into Ranges

For continuous numeric values, raw frequency counts are often not useful since most values may be unique. The bins parameter groups values into ranges and counts how many fall into each:

import pandas as pd

prices = pd.Series([10, 15, 20, 25, 40, 50, 55, 100])

# Group into 3 equal-width bins
print(prices.value_counts(bins=3))

Output:

(9.909, 40.0]    5
(40.0, 70.0] 2
(70.0, 100.0] 1
Name: count, dtype: int64

Five prices fall in the lowest range (roughly 10 to 40), two in the middle range (40 to 70), and one in the highest range (70 to 100). This is a quick way to understand the shape of a numeric distribution without creating a full histogram.

tip

For more control over bin edges and labels, use pd.cut() separately and then apply value_counts() on the result. The bins parameter in value_counts() creates equal-width bins automatically, which may not always match your analytical needs.

Controlling Sort Order

import pandas as pd

data = pd.Series(['B', 'A', 'C', 'A', 'B', 'A'])

# Default: sorted by count, most frequent first
print("By count (descending):")
print(data.value_counts())
print()

# Sorted alphabetically by value
print("By value (alphabetical):")
print(data.value_counts().sort_index())
print()

# Sorted by count, least frequent first
print("By count (ascending):")
print(data.value_counts(ascending=True))

Output:

By count (descending):
A 3
B 2
C 1
Name: count, dtype: int64

By value (alphabetical):
A 3
B 2
C 1
Name: count, dtype: int64

By count (ascending):
C 1
B 2
A 3
Name: count, dtype: int64

Selecting the Top N Most Frequent Values

When a column has many unique values, you often only care about the most common ones:

import pandas as pd

data = pd.Series(['A', 'B', 'C', 'A', 'B', 'A', 'D', 'E'])

# Top 3 most frequent
print(data.value_counts().head(3))

Output:

A    3
B 2
C 1
Name: count, dtype: int64

Adding Frequency as a DataFrame Column

Sometimes you need the frequency count for each row's value as a new column in the DataFrame. Use groupby with transform('count') to map the group-level count back to each individual row:

import pandas as pd

df = pd.DataFrame({
'City': ['NY', 'LA', 'NY', 'CHI', 'LA', 'LA'],
'Sales': [100, 200, 150, 300, 250, 175]
})

df['City_Freq'] = df.groupby('City')['City'].transform('count')

print(df)

Output:

  City  Sales  City_Freq
0 NY 100 2
1 LA 200 3
2 NY 150 2
3 CHI 300 1
4 LA 250 3
5 LA 175 3

Each row now shows how many times its city appears in the dataset. This is useful for feature engineering in machine learning or for identifying rare categories that might need special handling.

Building a Complete Frequency Table

Combine counts and percentages into a single summary table:

import pandas as pd

data = pd.Series(['A', 'A', 'A', 'B', 'B', 'C'], name='Category')

counts = data.value_counts()
pct = data.value_counts(normalize=True) * 100

freq_table = pd.DataFrame({
'Count': counts,
'Percent': pct.round(1),
'Cumulative': pct.cumsum().round(1)
})

print(freq_table)

Output:

          Count  Percent  Cumulative
Category
A 3 50.0 50.0
B 2 33.3 83.3
C 1 16.7 100.0

The cumulative column shows that A and B together account for 83.3% of all values.

Quick Reference

GoalMethod
Raw countss.value_counts()
Proportionss.value_counts(normalize=True)
Percentagess.value_counts(normalize=True) * 100
Include NaNs.value_counts(dropna=False)
Numeric binss.value_counts(bins=N)
Sort by values.value_counts().sort_index()
Top N valuess.value_counts().head(N)
Row-level frequencydf.groupby('col')['col'].transform('count')
  • Use value_counts() for quick frequency analysis. It handles sorting and counting efficiently out of the box.
  • Add normalize=True for proportions, dropna=False to include missing values, and bins to categorize continuous numeric data.
  • Use transform('count') when you need to add frequency counts back as a column in your DataFrame.