Skip to main content

Python Pandas: How to Calculate Itemset Frequency in Pandas

Counting how often specific combinations of values appear together across multiple columns is a fundamental operation in data analysis. This technique is essential for market basket analysis, pattern discovery, understanding customer behavior, and identifying the most common groupings in your data. For example, you might want to know which product-and-region combinations generate the most sales, or which category-and-brand pairings appear most frequently in your inventory.

In this guide, you will learn how to count multi-column value combinations using different Pandas methods, add percentage breakdowns, and filter for the most frequent patterns.

Counting Combinations with value_counts()

The simplest way to count unique value combinations is to select the columns of interest and call .value_counts():

import pandas as pd

df = pd.DataFrame({
'Category': ['Electronics', 'Clothing', 'Electronics', 'Electronics', 'Clothing'],
'Brand': ['Apple', 'Nike', 'Apple', 'Samsung', 'Nike'],
'Color': ['Black', 'Red', 'Black', 'White', 'Blue']
})

# Count unique combinations of Category and Brand
counts = df[['Category', 'Brand']].value_counts()

print(counts)

Output:

Category     Brand  
Clothing Nike 2
Electronics Apple 2
Samsung 1
Name: count, dtype: int64

The result is a Series with a MultiIndex, sorted by frequency in descending order by default. This tells you immediately that "Electronics + Apple" and "Clothing + Nike" are the most common pairings, each appearing twice.

Using groupby().size() for Flat Output

When you need the results as a clean, flat DataFrame suitable for exporting or further processing, use groupby().size() combined with .reset_index():

import pandas as pd

df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'A', 'B', 'A'],
'Color': ['Red', 'Blue', 'Red', 'Green', 'Blue', 'Red']
})

freq_df = (df.groupby(['Category', 'Color'])
.size()
.reset_index(name='Frequency')
.sort_values('Frequency', ascending=False))

print(freq_df)

Output:

  Category  Color  Frequency
1 A Red 3
2 B Blue 2
0 A Green 1

The .reset_index(name='Frequency') call converts the grouped result into a regular DataFrame with a descriptive column name, making it ready for reporting or visualization.

Counting Three or More Columns

The same approach scales to any number of columns. Simply include all the columns you want in the selection:

import pandas as pd

df = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA'],
'Category': ['Tech', 'Tech', 'Fashion', 'Fashion', 'Tech'],
'Brand': ['Apple', 'Apple', 'Nike', 'Zara', 'Sony']
})

# Three-way combinations
counts = df[['Store', 'Category', 'Brand']].value_counts()

print(counts)

Output:

Store  Category  Brand
NYC Tech Apple 2
LA Fashion Nike 1
Tech Sony 1
NYC Fashion Zara 1
Name: count, dtype: int64

The "NYC + Tech + Apple" combination stands out as the only one appearing more than once.

Adding Percentage Breakdowns

To understand not just how often combinations appear but what proportion of the total they represent, use the normalize=True parameter:

import pandas as pd

df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'A', 'B'],
'Color': ['Red', 'Blue', 'Red', 'Green', 'Blue']
})

counts = df[['Category', 'Color']].value_counts()
percentages = df[['Category', 'Color']].value_counts(normalize=True) * 100

result = pd.DataFrame({
'Count': counts,
'Percent': percentages.round(1)
})

print(result)

Output:

                Count  Percent
Category Color
A Red 2 40.0
B Blue 2 40.0
A Green 1 20.0

This shows that "A + Red" and "B + Blue" each account for 40% of all records, while "A + Green" makes up the remaining 20%.

Filtering for Top or Frequent Combinations

Selecting the Top N Combinations

Use .head() to retrieve only the most common pairings:

import pandas as pd

df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'A', 'C'],
'Region': ['East', 'West', 'East', 'East', 'East', 'West', 'West', 'East']
})

top_3 = df[['Product', 'Region']].value_counts().head(3)

print(top_3)

Output:

Product  Region
A East 3
B West 2
C East 2
Name: count, dtype: int64

Filtering by Minimum Frequency

To find only combinations that appear more than a certain number of times, filter the counts directly:

import pandas as pd

df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'A', 'C'],
'Region': ['East', 'West', 'East', 'East', 'East', 'West', 'West', 'East']
})

counts = df[['Product', 'Region']].value_counts()
frequent = counts[counts > 1]

print(frequent)

Output:

Product  Region
A East 3
B West 2
C East 2
Name: count, dtype: int64

Only combinations appearing more than once are included, filtering out any rare one-off pairings.

Practical Example: Sales Pattern Analysis

Here is a more complete example that combines several techniques into a typical analysis workflow:

import pandas as pd

df = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA', 'NYC', 'LA', 'NYC'],
'Category': ['Tech', 'Fashion', 'Tech', 'Tech', 'Fashion', 'Tech', 'Tech', 'Fashion'],
'Day': ['Mon', 'Mon', 'Tue', 'Mon', 'Tue', 'Tue', 'Mon', 'Mon']
})

# Build a complete frequency table
freq = (df[['Store', 'Category']]
.value_counts()
.reset_index(name='Count'))

total = freq['Count'].sum()
freq['Percent'] = (freq['Count'] / total * 100).round(1)
freq['Cum_Percent'] = freq['Percent'].cumsum()

print(freq)

Output:

  Store Category  Count  Percent  Cum_Percent
0 NYC Tech 3 37.5 37.5
1 LA Tech 2 25.0 62.5
2 NYC Fashion 2 25.0 87.5
3 LA Fashion 1 12.5 100.0

This output shows that "NYC + Tech" is the most common combination at 37.5%, and the top two combinations together account for 62.5% of all records.

tip

When working with large datasets, value_counts() is highly optimized and typically faster than equivalent groupby().size() operations. Use value_counts() for quick exploration and groupby().size().reset_index() when you need the result as a flat DataFrame for downstream processing.

Quick Reference

MethodOutput TypeBest For
df[cols].value_counts()MultiIndex SeriesQuick exploration and analysis
groupby(cols).size()SeriesCustom aggregations
.reset_index(name='col')Flat DataFrameExporting, reporting, plotting
value_counts(normalize=True)ProportionsPercentage analysis
  • Use df[['col1', 'col2']].value_counts() for quick frequency counts of column combinations.
  • Use groupby().size().reset_index() when you need a flat DataFrame for exporting or visualization.
  • Add normalize=True to convert raw counts into proportions, and filter with boolean indexing to isolate the most frequent or significant patterns.