Python Pandas: How to Calculate Itemset Frequency in Pandas
Counting how often specific combinations of values appear together across multiple columns is a fundamental operation in data analysis. This technique is essential for market basket analysis, pattern discovery, understanding customer behavior, and identifying the most common groupings in your data. For example, you might want to know which product-and-region combinations generate the most sales, or which category-and-brand pairings appear most frequently in your inventory.
In this guide, you will learn how to count multi-column value combinations using different Pandas methods, add percentage breakdowns, and filter for the most frequent patterns.
Counting Combinations with value_counts()
The simplest way to count unique value combinations is to select the columns of interest and call .value_counts():
import pandas as pd
df = pd.DataFrame({
'Category': ['Electronics', 'Clothing', 'Electronics', 'Electronics', 'Clothing'],
'Brand': ['Apple', 'Nike', 'Apple', 'Samsung', 'Nike'],
'Color': ['Black', 'Red', 'Black', 'White', 'Blue']
})
# Count unique combinations of Category and Brand
counts = df[['Category', 'Brand']].value_counts()
print(counts)
Output:
Category Brand
Clothing Nike 2
Electronics Apple 2
Samsung 1
Name: count, dtype: int64
The result is a Series with a MultiIndex, sorted by frequency in descending order by default. This tells you immediately that "Electronics + Apple" and "Clothing + Nike" are the most common pairings, each appearing twice.
Using groupby().size() for Flat Output
When you need the results as a clean, flat DataFrame suitable for exporting or further processing, use groupby().size() combined with .reset_index():
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'A', 'B', 'A'],
'Color': ['Red', 'Blue', 'Red', 'Green', 'Blue', 'Red']
})
freq_df = (df.groupby(['Category', 'Color'])
.size()
.reset_index(name='Frequency')
.sort_values('Frequency', ascending=False))
print(freq_df)
Output:
Category Color Frequency
1 A Red 3
2 B Blue 2
0 A Green 1
The .reset_index(name='Frequency') call converts the grouped result into a regular DataFrame with a descriptive column name, making it ready for reporting or visualization.
Counting Three or More Columns
The same approach scales to any number of columns. Simply include all the columns you want in the selection:
import pandas as pd
df = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA'],
'Category': ['Tech', 'Tech', 'Fashion', 'Fashion', 'Tech'],
'Brand': ['Apple', 'Apple', 'Nike', 'Zara', 'Sony']
})
# Three-way combinations
counts = df[['Store', 'Category', 'Brand']].value_counts()
print(counts)
Output:
Store Category Brand
NYC Tech Apple 2
LA Fashion Nike 1
Tech Sony 1
NYC Fashion Zara 1
Name: count, dtype: int64
The "NYC + Tech + Apple" combination stands out as the only one appearing more than once.
Adding Percentage Breakdowns
To understand not just how often combinations appear but what proportion of the total they represent, use the normalize=True parameter:
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'A', 'B'],
'Color': ['Red', 'Blue', 'Red', 'Green', 'Blue']
})
counts = df[['Category', 'Color']].value_counts()
percentages = df[['Category', 'Color']].value_counts(normalize=True) * 100
result = pd.DataFrame({
'Count': counts,
'Percent': percentages.round(1)
})
print(result)
Output:
Count Percent
Category Color
A Red 2 40.0
B Blue 2 40.0
A Green 1 20.0
This shows that "A + Red" and "B + Blue" each account for 40% of all records, while "A + Green" makes up the remaining 20%.
Filtering for Top or Frequent Combinations
Selecting the Top N Combinations
Use .head() to retrieve only the most common pairings:
import pandas as pd
df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'A', 'C'],
'Region': ['East', 'West', 'East', 'East', 'East', 'West', 'West', 'East']
})
top_3 = df[['Product', 'Region']].value_counts().head(3)
print(top_3)
Output:
Product Region
A East 3
B West 2
C East 2
Name: count, dtype: int64
Filtering by Minimum Frequency
To find only combinations that appear more than a certain number of times, filter the counts directly:
import pandas as pd
df = pd.DataFrame({
'Product': ['A', 'B', 'A', 'C', 'A', 'B', 'A', 'C'],
'Region': ['East', 'West', 'East', 'East', 'East', 'West', 'West', 'East']
})
counts = df[['Product', 'Region']].value_counts()
frequent = counts[counts > 1]
print(frequent)
Output:
Product Region
A East 3
B West 2
C East 2
Name: count, dtype: int64
Only combinations appearing more than once are included, filtering out any rare one-off pairings.
Practical Example: Sales Pattern Analysis
Here is a more complete example that combines several techniques into a typical analysis workflow:
import pandas as pd
df = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'NYC', 'LA', 'NYC', 'LA', 'NYC'],
'Category': ['Tech', 'Fashion', 'Tech', 'Tech', 'Fashion', 'Tech', 'Tech', 'Fashion'],
'Day': ['Mon', 'Mon', 'Tue', 'Mon', 'Tue', 'Tue', 'Mon', 'Mon']
})
# Build a complete frequency table
freq = (df[['Store', 'Category']]
.value_counts()
.reset_index(name='Count'))
total = freq['Count'].sum()
freq['Percent'] = (freq['Count'] / total * 100).round(1)
freq['Cum_Percent'] = freq['Percent'].cumsum()
print(freq)
Output:
Store Category Count Percent Cum_Percent
0 NYC Tech 3 37.5 37.5
1 LA Tech 2 25.0 62.5
2 NYC Fashion 2 25.0 87.5
3 LA Fashion 1 12.5 100.0
This output shows that "NYC + Tech" is the most common combination at 37.5%, and the top two combinations together account for 62.5% of all records.
When working with large datasets, value_counts() is highly optimized and typically faster than equivalent groupby().size() operations. Use value_counts() for quick exploration and groupby().size().reset_index() when you need the result as a flat DataFrame for downstream processing.
Quick Reference
| Method | Output Type | Best For |
|---|---|---|
df[cols].value_counts() | MultiIndex Series | Quick exploration and analysis |
groupby(cols).size() | Series | Custom aggregations |
.reset_index(name='col') | Flat DataFrame | Exporting, reporting, plotting |
value_counts(normalize=True) | Proportions | Percentage analysis |
- Use
df[['col1', 'col2']].value_counts()for quick frequency counts of column combinations. - Use
groupby().size().reset_index()when you need a flat DataFrame for exporting or visualization. - Add
normalize=Trueto convert raw counts into proportions, and filter with boolean indexing to isolate the most frequent or significant patterns.