Python Pandas: How to Use Hierarchical Indexes with Pandas

Hierarchical indexing (also called multi-indexing) allows you to set multiple columns as the index of a DataFrame, creating a tree-like structure of labels. This is powerful for organizing complex datasets, such as data grouped by region and state, or by year and quarter, enabling efficient slicing, aggregation, and restructuring.

This guide explains how to create, navigate, and manipulate hierarchical indexes in Pandas.

What Is a Hierarchical Index?

A standard DataFrame has a single-level index. A hierarchical index has two or more levels, where each level represents a different category:

Standard Index:          Hierarchical Index:
                         Level 0    Level 1
Index  Value             Region     State      Value
0      100               East       New York   100
1      200               East       New Jersey 200
2      300               West       California 300
3      400               West       Oregon     400

The outer level (Region) groups data, and the inner level (State) provides specifics within each group.

Creating a Hierarchical Index

Using `set_index()` with Multiple Columns

Pass a list of column names to set_index() to create a multi-level index:

import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'South', 'South'],
    'State': ['New York', 'New Jersey', 'California', 'Oregon', 'Texas', 'Florida'],
    'Population': [19500000, 9300000, 39500000, 4200000, 29100000, 21500000],
    'GDP': [1900, 650, 3600, 270, 1900, 1100]
})

print("Original DataFrame:")
print(df)

# Set Region and State as hierarchical index
df_multi = df.set_index(['Region', 'State'])
print("\nWith Hierarchical Index:")
print(df_multi)

Output:

Original DataFrame:
  Region       State  Population   GDP
0   East    New York    19500000  1900
1   East  New Jersey     9300000   650
2   West  California    39500000  3600
3   West      Oregon     4200000   270
4  South       Texas    29100000  1900
5  South     Florida    21500000  1100

With Hierarchical Index:
                   Population   GDP
Region State                       
East   New York      19500000  1900
       New Jersey     9300000   650
West   California    39500000  3600
       Oregon         4200000   270
South  Texas         29100000  1900
       Florida       21500000  1100

The Region column becomes level 0 and State becomes level 1 of the index.

Sorting the Hierarchical Index

For efficient slicing, always sort a multi-index after creating it:

df_multi = df_multi.sort_index()
print(df_multi)

Output:

                   Population   GDP
Region State                       
East   New Jersey     9300000   650
       New York      19500000  1900
South  Florida       21500000  1100
       Texas         29100000  1900
West   California    39500000  3600
       Oregon         4200000   270

tip

Always call sort_index() after setting a hierarchical index. Many loc[] operations require a sorted index to work correctly and perform efficiently. An unsorted multi-index may raise a KeyError or produce unexpected results.

Selecting Data with a Hierarchical Index

Selecting by the Outer Level (Level 0)

Use loc[] with the outer level value to get all rows in that group:

# Select all states in the 'East' region
east = df_multi.loc['East']
print(east)

Output:

            Population   GDP
State                       
New Jersey     9300000   650
New York      19500000  1900

The result drops the outer index level and uses the inner level (State) as the index.

Selecting Multiple Outer Level Values

Pass a list to select multiple groups:

# Select East and West regions
east_west = df_multi.loc[['East', 'West']]
print(east_west)

Output:

                   Population   GDP
Region State                       
East   New Jersey     9300000   650
       New York      19500000  1900
West   California    39500000  3600
       Oregon         4200000   270

Selecting by Both Levels Using Tuples

To select specific combinations of outer and inner index values, use tuples:

# Select specific Region-State combinations
selected = df_multi.loc[('East', 'New York')]
print(selected)

Output:

Population    19500000
GDP               1900
Name: (East, New York), dtype: int64

For multiple specific combinations, pass a list of tuples:

selected = df_multi.loc[[('East', 'New York'), ('West', 'California')]]
print(selected)

Output:

Region State                       
East   New York      19500000  1900
West   California    39500000  3600

Common Mistake: Selecting by Inner Level Only

You cannot directly use loc[] with only the inner level value:

# WRONG: selecting by inner level directly raises KeyError
try:
    result = df_multi.loc['New York']
except KeyError as e:
    print(f"KeyError: {e}")

Output:

KeyError: 'New York'

The correct approaches:

# Option 1: Use xs() to select by an inner level
result = df_multi.xs('New York', level='State')
print("Using xs():")
print(result)

# Option 2: Use a tuple with the outer level
result = df_multi.loc[('East', 'New York')]
print("\nUsing tuple:")
print(result)

Output:

Using xs():
        Population   GDP
Region                  
East      19500000  1900

Using tuple:
Population    19500000
GDP               1900
Name: (East, New York), dtype: int64

warning

You cannot use loc[] with only an inner-level value. Use xs(value, level='level_name') to select data from any specific level, or provide values for all outer levels using tuples.

Using `xs()` for Cross-Section Selection

The xs() method is designed for selecting data at a specific level of a multi-index:

import pandas as pd

df = pd.DataFrame({
    'Year': [2023, 2023, 2024, 2024],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Revenue': [100, 150, 120, 180],
    'Profit': [20, 35, 25, 45]
})

df_multi = df.set_index(['Year', 'Quarter']).sort_index()
print(df_multi)

# Select all Q1 data across years
q1_data = df_multi.xs('Q1', level='Quarter')
print("\nAll Q1 data:")
print(q1_data)

Output:

              Revenue  Profit
Year Quarter                 
2023 Q1           100      20
     Q2           150      35
2024 Q1           120      25
     Q2           180      45

All Q1 data:
      Revenue  Profit
Year                 
2023      100      20
2024      120      25

Resetting the Index

To convert the hierarchical index back to regular columns, use reset_index():

# Reset all index levels to columns
df_flat = df_multi.reset_index()
print(df_flat)

Output:

   Year Quarter  Revenue  Profit
2023      Q1      100      20
2023      Q2      150      35
2024      Q1      120      25
2024      Q2      180      45

To reset only a specific level:

# Reset only the 'Quarter' level
df_partial = df_multi.reset_index(level='Quarter')
print(df_partial)

Output:

     Quarter  Revenue  Profit
Year                         
2023      Q1      100      20
2023      Q2      150      35
2024      Q1      120      25
2024      Q2      180      45

Aggregating with Hierarchical Indexes

Multi-indexes work naturally with groupby() for hierarchical aggregation:

import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West', 'East', 'West'],
    'State': ['NY', 'NJ', 'CA', 'OR', 'NY', 'CA'],
    'Sales': [100, 80, 200, 50, 120, 180]
})

# Group by multiple levels and aggregate
summary = df.groupby(['Region', 'State'])['Sales'].agg(['sum', 'mean', 'count'])
print(summary)

Output:

              sum   mean  count
Region State                   
East   NJ      80   80.0      1
       NY     220  110.0      2
West   CA     380  190.0      2
       OR      50   50.0      1

The result automatically has a hierarchical index.

Swapping Index Levels

Use swaplevel() to change the order of index levels:

import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West'],
    'State': ['NY', 'NJ', 'CA', 'OR'],
    'Population': [19500000, 9300000, 39500000, 4200000]
})

df_multi = df.set_index(['Region', 'State']).sort_index()
print("Original order:")
print(df_multi)

# Swap levels: State becomes level 0, Region becomes level 1
swapped = df_multi.swaplevel().sort_index()
print("\nSwapped levels:")
print(swapped)

Output:

Original order:
              Population
Region State            
East   NJ        9300000
       NY       19500000
West   CA       39500000
       OR        4200000

Swapped levels:
              Population
State Region            
CA    West      39500000
NJ    East       9300000
NY    East      19500000
OR    West       4200000

Quick Reference

Operation	Code
Create hierarchical index	`df.set_index(['col1', 'col2'])`
Sort the multi-index	`df.sort_index()`
Select by outer level	`df.loc['value']`
Select by both levels (tuple)	`df.loc[('outer', 'inner')]`
Select by inner level	`df.xs('value', level='level_name')`
Reset all levels to columns	`df.reset_index()`
Reset one level	`df.reset_index(level='level_name')`
Swap index levels	`df.swaplevel()`
Get index level names	`df.index.names`
Get index level values	`df.index.get_level_values('level_name')`

Hierarchical indexing transforms flat DataFrames into structured, multi-dimensional data that mirrors real-world hierarchies. By mastering set_index(), loc[] with tuples, and xs(), you can efficiently organize, slice, and aggregate complex datasets in Pandas.

What Is a Hierarchical Index?​

Creating a Hierarchical Index​

Using set_index() with Multiple Columns​

Sorting the Hierarchical Index​

Selecting Data with a Hierarchical Index​

Selecting by the Outer Level (Level 0)​

Selecting Multiple Outer Level Values​

Selecting by Both Levels Using Tuples​

Common Mistake: Selecting by Inner Level Only​

The correct approaches:​

Using xs() for Cross-Section Selection​

Resetting the Index​

Aggregating with Hierarchical Indexes​

Swapping Index Levels​

Quick Reference​

Table of Contents

What Is a Hierarchical Index?

Creating a Hierarchical Index

Using `set_index()` with Multiple Columns

Sorting the Hierarchical Index

Selecting Data with a Hierarchical Index

Selecting by the Outer Level (Level 0)

Selecting Multiple Outer Level Values

Selecting by Both Levels Using Tuples

Common Mistake: Selecting by Inner Level Only

The correct approaches:

Using `xs()` for Cross-Section Selection

Resetting the Index

Aggregating with Hierarchical Indexes

Swapping Index Levels

Quick Reference