Python Pandas: How to Concatenate MultiIndex into Single Index in Pandas
Hierarchical (multi-level) indices in Pandas are powerful for representing structured data, but they can complicate display, export, and downstream processing. Flattening a MultiIndex into a single descriptive string index simplifies these tasks by combining the levels into readable labels like "2023_Q1" or "Sales_Revenue". This operation is common after groupby aggregations, pivot tables, or any operation that produces a MultiIndex.
In this guide, you will learn how to flatten both row and column MultiIndexes using different techniques, handle non-string data types, and choose the right approach for your formatting needs.
Using map() with join() for Fast Flattening
The most efficient method for MultiIndexes where all levels contain strings is to use .map() with a join function:
import pandas as pd
# Create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_tuples([
('2023', 'Q1'),
('2023', 'Q2'),
('2024', 'Q1')
])
df = pd.DataFrame({'Sales': [100, 150, 200]}, index=index)
print("Before:")
print(df)
print()
# Flatten with underscore separator
df.index = df.index.map('_'.join)
print("After:")
print(df)
Output:
Before:
Sales
2023 Q1 100
Q2 150
2024 Q1 200
After:
Sales
2023_Q1 100
2023_Q2 150
2024_Q1 200
The '_'.join function is called on each tuple in the MultiIndex, joining the level values with an underscore. You can use any separator string.
Different Separator Options
import pandas as pd
index = pd.MultiIndex.from_tuples([('2023', 'Q1'), ('2023', 'Q2')])
df = pd.DataFrame({'Value': [100, 200]}, index=index)
# Various separators
print(index.map(' - '.join)) # ['2023 - Q1', '2023 - Q2']
print(index.map('/'.join)) # ['2023/Q1', '2023/Q2']
print(index.map('.'.join)) # ['2023.Q1', '2023.Q2']
Output:
Index(['2023 - Q1', '2023 - Q2'], dtype='object')
Index(['2023/Q1', '2023/Q2'], dtype='object')
Index(['2023.Q1', '2023.Q2'], dtype='object')
Handling Non-String Data Types
When MultiIndex levels contain numbers or other non-string types, the join() method will raise a TypeError. You need to convert the values to strings first:
import pandas as pd
index = pd.MultiIndex.from_tuples([
('Category', 101),
('Category', 102),
('Category', 103)
])
df = pd.DataFrame({'Value': [10, 20, 30]}, index=index)
# This fails because 101, 102, 103 are integers
try:
df.index = df.index.map('_'.join)
except TypeError as e:
print(f"Error: {e}")
Output:
Error: sequence item 1: expected str instance, int found
The fix is to convert each element to a string before joining:
import pandas as pd
index = pd.MultiIndex.from_tuples([
('Category', 101),
('Category', 102),
('Category', 103)
])
df = pd.DataFrame({'Value': [10, 20, 30]}, index=index)
df.index = df.index.map(lambda x: '_'.join(map(str, x)))
print(df)
Output:
Value
Category_101 10
Category_102 20
Category_103 30
Using lambda x: '_'.join(map(str, x)) is a safe universal approach that works regardless of the data types in your MultiIndex levels. If you know all levels are strings, the simpler '_'.join is faster.
Custom Formatting with f-strings
For more complex formatting where you need different treatment for each level, use a lambda with an f-string:
import pandas as pd
index = pd.MultiIndex.from_tuples([
('Alice', 'NYC'),
('Bob', 'LA'),
('Charlie', 'Chicago')
])
df = pd.DataFrame({'Score': [85, 92, 78]}, index=index)
# Custom formatted index
df.index = df.index.map(lambda x: f"{x[0]} ({x[1]})")
print(df)
Output:
Score
Alice (NYC) 85
Bob (LA) 92
Charlie (Chicago) 78
This approach gives you complete control over how each level is incorporated into the final string. You can reference levels by position (x[0], x[1], etc.) and apply any formatting you need.
Flattening Column MultiIndex
The same techniques work for hierarchical column names, which commonly appear after pivot table operations or multi-level aggregations:
import pandas as pd
# Create a DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([
('Sales', 'Q1'),
('Sales', 'Q2'),
('Profit', 'Q1')
])
df = pd.DataFrame([[100, 150, 20], [200, 250, 40]], columns=columns)
print("Before:")
print(df)
print()
# Flatten column names
df.columns = df.columns.map('_'.join)
print("After:")
print(df)
Output:
Before:
Sales Profit
Q1 Q2 Q1
0 100 150 20
1 200 250 40
After:
Sales_Q1 Sales_Q2 Profit_Q1
0 100 150 20
1 200 250 40
Flattened column names are much easier to reference in subsequent code (df['Sales_Q1'] instead of df[('Sales', 'Q1')]) and export cleanly to CSV files.
Selecting Specific Levels with get_level_values()
When your MultiIndex has three or more levels and you only want to combine some of them, use get_level_values() to extract individual levels:
import pandas as pd
index = pd.MultiIndex.from_tuples([
('2023', 'Jan', 'Week1'),
('2023', 'Jan', 'Week2'),
('2023', 'Feb', 'Week1')
])
df = pd.DataFrame({'Sales': [100, 120, 150]}, index=index)
# Combine only the first two levels, ignoring the third
level0 = df.index.get_level_values(0)
level1 = df.index.get_level_values(1)
df.index = level0 + '_' + level1
print(df)
Output:
Sales
2023_Jan 100
2023_Jan 120
2023_Feb 150
This gives you precise control over which levels are included in the flattened index and which are discarded.
Practical Example: Flattening After GroupBy
A common scenario where MultiIndex flattening is needed is after a groupby aggregation with multiple levels:
import pandas as pd
df = pd.DataFrame({
'Region': ['East', 'East', 'West', 'West'],
'Product': ['A', 'B', 'A', 'B'],
'Sales': [100, 150, 200, 250],
'Profit': [10, 20, 30, 40]
})
# GroupBy produces a MultiIndex
grouped = df.groupby(['Region', 'Product']).sum()
print("After groupby (MultiIndex):")
print(grouped)
print()
# Flatten for easier access
grouped.index = grouped.index.map(lambda x: '_'.join(x))
print("After flattening:")
print(grouped)
Output:
After groupby (MultiIndex):
Sales Profit
Region Product
East A 100 10
B 150 20
West A 200 30
B 250 40
After flattening:
Sales Profit
East_A 100 10
East_B 150 20
West_A 200 30
West_B 250 40
An alternative to flattening after groupby is to use as_index=False in the groupby() call itself, which returns a regular DataFrame with the grouping columns as regular columns instead of an index. Choose flattening when you want a descriptive single index; choose as_index=False when you want the grouping values as separate columns.
Quick Reference
| Method | Best For | Handles Mixed Types |
|---|---|---|
index.map('_'.join) | Simple string joins | No |
index.map(lambda x: '_'.join(map(str, x))) | Mixed data types | Yes |
| f-string lambda | Custom formatting | Yes |
get_level_values() | Selecting specific levels | Yes |
- Use
index.map('_'.join)for fast flattening when all levels are strings. - For mixed data types, wrap with
map(str, x)to convert all values to strings first. - Use f-string lambdas when you need custom formatting beyond simple concatenation.
- All of these techniques apply equally to column MultiIndexes by operating on
df.columnsinstead ofdf.index.