Skip to main content

Python Pandas: How to Convert a Pandas Crosstab into a Stacked DataFrame

Crosstabs produce wide-format frequency tables that are great for quick visual inspection, but many analysis tools, visualization libraries like Seaborn, and machine learning pipelines require data in "long" (tidy) format instead. Converting between these formats is a common reshaping operation in data analysis workflows.

In this guide, you will learn how to transform a wide crosstab into a long stacked DataFrame using the .stack() method, flatten the result into a clean table, and reverse the operation when needed.

Creating a Crosstab

A crosstab counts the frequency of value combinations across two or more categorical variables, producing a wide-format table:

import pandas as pd

data = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz', 'BMW', 'Benz'],
'Fuel': ['Petrol', 'Diesel', 'Diesel', 'Petrol', 'Petrol']
})

ct = pd.crosstab(index=data['Brand'], columns=data['Fuel'])

print(ct)

Output:

Fuel   Diesel  Petrol
Brand
BMW 1 2
Benz 1 1

This wide format shows one row per brand and one column per fuel type. While readable, many downstream tools expect each observation to occupy its own row.

Stacking to Long Format

The .stack() method moves column headers into row index values, converting the wide table into a long MultiIndex Series:

import pandas as pd

data = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz', 'BMW', 'Benz'],
'Fuel': ['Petrol', 'Diesel', 'Diesel', 'Petrol', 'Petrol']
})

ct = pd.crosstab(data['Brand'], data['Fuel'])

stacked = ct.stack()

print(stacked)

Output:

Brand  Fuel  
BMW Diesel 1
Petrol 2
Benz Diesel 1
Petrol 1
dtype: int64

Each brand-fuel combination now occupies its own row. The result is a Series with a MultiIndex, where the former column headers (Diesel, Petrol) have become part of the index.

Resetting the Index for a Flat DataFrame

The stacked result is a MultiIndex Series, which is not always convenient for further processing. Use .reset_index() to convert it into a regular flat DataFrame:

import pandas as pd

data = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz', 'BMW', 'Benz'],
'Fuel': ['Petrol', 'Diesel', 'Diesel', 'Petrol', 'Petrol']
})

ct = pd.crosstab(data['Brand'], data['Fuel'])
stacked = ct.stack()

# Flatten to a regular DataFrame with a named value column
df_long = stacked.reset_index(name='Count')

print(df_long)

Output:

  Brand    Fuel  Count
0 BMW Diesel 1
1 BMW Petrol 2
2 Benz Diesel 1
3 Benz Petrol 1

The name='Count' parameter gives the value column a descriptive name. This flat structure is ready for export, visualization, or further analysis.

tip

The complete transformation from crosstab to flat long-format DataFrame can be written as a single chained expression:

df_long = pd.crosstab(data['Brand'], data['Fuel']).stack().reset_index(name='Count')

Handling Multi-Level Crosstabs

When a crosstab has multiple column levels, .stack() operates on the innermost level by default. You can specify which level to stack using the level parameter:

import pandas as pd

data = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz', 'Benz'],
'Year': [2022, 2023, 2022, 2023],
'Fuel': ['Petrol', 'Diesel', 'Diesel', 'Petrol']
})

ct = pd.crosstab(
index=data['Brand'],
columns=[data['Year'], data['Fuel']]
)

print("Crosstab:")
print(ct)
print()

# Stack the innermost level (Fuel) - default behavior
stacked_fuel = ct.stack()
print("Stacked (innermost level - Fuel):")
print(stacked_fuel)
print()

# Stack a specific level (Year)
stacked_year = ct.stack(level=0)
print("Stacked (level 0 - Year):")
print(stacked_year)

Output:

Crosstab:
Year 2022 2023
Fuel Diesel Petrol Diesel Petrol
Brand
BMW 0 1 1 0
Benz 1 0 0 1

Stacked (innermost level - Fuel):
Year 2022 2023
Brand Fuel
BMW Diesel 0 1
Petrol 1 0
Benz Diesel 1 0
Petrol 0 1

Stacked (level 0 - Year):
Fuel Diesel Petrol
Brand Year
BMW 2022 0 1
2023 1 0
Benz 2022 1 0
2023 0 1

Choosing which level to stack gives you control over the final shape of the data.

Filtering Zero Counts After Stacking

Crosstabs include zero counts for combinations that do not appear in the original data. After stacking, you may want to remove these empty combinations:

import pandas as pd

data = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz'],
'Fuel': ['Petrol', 'Petrol', 'Diesel']
})

ct = pd.crosstab(data['Brand'], data['Fuel'])
df_long = ct.stack().reset_index(name='Count')

print("With zero counts:")
print(df_long)
print()

# Remove rows where Count is 0
df_long = df_long[df_long['Count'] > 0]
print("Without zero counts:")
print(df_long)

Output:

With zero counts:
Brand Fuel Count
0 BMW Diesel 0
1 BMW Petrol 2
2 Benz Diesel 1
3 Benz Petrol 0

Without zero counts:
Brand Fuel Count
1 BMW Petrol 2
2 Benz Diesel 1

Unstacking: The Reverse Operation

The .unstack() method performs the opposite transformation, converting long format back to wide format:

import pandas as pd

# Start with long format data
df_long = pd.DataFrame({
'Brand': ['BMW', 'BMW', 'Benz', 'Benz'],
'Fuel': ['Diesel', 'Petrol', 'Diesel', 'Petrol'],
'Count': [1, 2, 1, 1]
})

# Set the index and unstack
wide = df_long.set_index(['Brand', 'Fuel'])['Count'].unstack()

print(wide)

Output:

Fuel   Diesel  Petrol
Brand
BMW 1 2
Benz 1 1

This produces the same wide format as the original crosstab, completing the round trip between wide and long formats.

Quick Reference

MethodDirectionUse Case
.stack()Wide to longPrepare data for plotting or ML
.unstack()Long to wideCreate pivot-style summary tables
.reset_index(name='col')Flatten MultiIndexExport to CSV, Excel, or databases
.stack(level=n)Stack specific levelMulti-level column control
  • Use .stack() to transform wide crosstab data into long format, which is the structure expected by most visualization libraries and machine learning pipelines.
  • Follow with .reset_index(name='column') to produce a clean, flat DataFrame ready for export or further processing.
  • Use .unstack() when you need to reverse the transformation and return to wide format.