Skip to main content

Python Pandas: How to Sort a Pandas DataFrame by Multiple Columns

Sorting a DataFrame by multiple columns is a fundamental operation in data analysis. It allows you to organize data hierarchically - for example, sorting employees first by department and then by salary within each department. Pandas offers several methods to accomplish this, from the versatile sort_values() to specialized functions like nlargest() and nsmallest().

This guide covers each approach with practical examples, outputs, and guidance on when to use which method.

Sample DataFrame

All examples use the following DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

print(df)

Output:

    Name  Age  Rank
0 Ryan 20 1.0
1 Aiden 22 NaN
2 Noah 21 8.0
3 Caleb 19 9.0
4 Emma 17 4.0
5 Chloe 23 NaN

Using sort_values() - The Most Flexible Method

The sort_values() method is the standard way to sort a DataFrame by one or more columns. It supports independent ascending/descending order per column and configurable NaN placement.

Sorting by Multiple Columns

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Sort by 'Rank' ascending, then by 'Age' descending for ties
result = df.sort_values(
by=['Rank', 'Age'],
ascending=[True, False],
na_position='last'
)
print(result)

Output:

    Name  Age  Rank
0 Ryan 20 1.0
4 Emma 17 4.0
2 Noah 21 8.0
3 Caleb 19 9.0
5 Chloe 23 NaN
1 Aiden 22 NaN

Key parameters:

ParameterDescription
byList of column names to sort by, in priority order
ascendingList of booleans - True for ascending, False for descending - one per column
na_position'last' (default) or 'first' - where to place NaN values
ignore_indexIf True, resets the index to 0, 1, 2, ... after sorting

Sorting with Reset Index

When you want a clean index after sorting:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

result = df.sort_values(
by=['Rank', 'Age'],
ascending=[True, False],
na_position='last',
ignore_index=True
)
print(result)

Output:

    Name  Age  Rank
0 Ryan 20 1.0
1 Emma 17 4.0
2 Noah 21 8.0
3 Caleb 19 9.0
4 Chloe 23 NaN
5 Aiden 22 NaN
tip

sort_values() is the go-to method for most sorting tasks. It handles multiple columns, mixed sort directions, and NaN values cleanly in a single call.

Using nlargest() - Top N Rows by Value

The nlargest() method is optimized to retrieve the top N rows with the highest values in specified columns. It is faster than sorting the entire DataFrame when you only need a few top results.

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get the 3 rows with the highest 'Rank' values
top_3 = df.nlargest(3, 'Rank')
print(top_3)

Output:

    Name  Age  Rank
3 Caleb 19 9.0
2 Noah 21 8.0
4 Emma 17 4.0

NaN values are automatically ignored. You can also sort by multiple columns - nlargest uses the second column as a tiebreaker:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Top 3 by Rank, then by Age for ties
top_3 = df.nlargest(3, ['Rank', 'Age'])
print(top_3)

Output:

    Name  Age  Rank
3 Caleb 19 9.0
2 Noah 21 8.0
4 Emma 17 4.0

Using nsmallest() - Bottom N Rows by Value

The counterpart to nlargest(), this method retrieves the lowest N values:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get the 3 rows with the lowest 'Rank' values
bottom_3 = df.nsmallest(3, 'Rank')
print(bottom_3)

Output:

   Name  Age  Rank
0 Ryan 20 1.0
4 Emma 17 4.0
2 Noah 21 8.0
info

nlargest() and nsmallest() are particularly efficient for large DataFrames when you only need a small subset of top/bottom rows. They use a partial sort algorithm that avoids sorting the entire dataset.

Using sort_index()

The sort_index() method sorts by the DataFrame's index rather than column values. This is useful after operations that disorder the index, such as filtering, grouping, or setting a custom index.

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Sort by index in descending order
result = df.sort_index(ascending=False)
print(result)

Output:

    Name  Age  Rank
5 Chloe 23 NaN
4 Emma 17 4.0
3 Caleb 19 9.0
2 Noah 21 8.0
1 Aiden 22 NaN
0 Ryan 20 1.0

Sorting by a MultiIndex

sort_index() becomes powerful with a MultiIndex:

import pandas as pd

df = pd.DataFrame({
'Score': [88, 92, 75, 81, 95, 70]
}, index=pd.MultiIndex.from_tuples([
('Math', 'Alice'), ('Math', 'Bob'),
('Science', 'Alice'), ('Science', 'Bob'),
('English', 'Alice'), ('English', 'Bob')
], names=['Subject', 'Student']))

result = df.sort_index(level=['Subject', 'Student'], ascending=[True, False])
print(result)

Output:

                 Score
Subject Student
English Bob 70
Alice 95
Math Bob 92
Alice 88
Science Bob 81
Alice 75

Using argsort() with NumPy

For performance-critical scenarios, np.argsort() computes sorted indices directly on the underlying NumPy array, which can be faster than Pandas sorting for very large datasets:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get sorted indices for 'Rank' column
sorted_idx = np.argsort(df['Rank'].values)
result = df.iloc[sorted_idx]
print(result)

Output:

    Name  Age  Rank
0 Ryan 20 1.0
4 Emma 17 4.0
2 Noah 21 8.0
3 Caleb 19 9.0
1 Aiden 22 NaN
5 Chloe 23 NaN
warning

np.argsort() sorts by a single column only. For multi-column sorting, sort_values() is far more practical. Also, np.argsort() places NaN values at the end by default, but this behavior is not configurable - use sort_values(na_position=...) if you need control over NaN placement.

Common Mistake: Mismatched ascending List Length

When sorting by multiple columns, the ascending parameter must have the same length as the by parameter. Passing a single boolean applies it to all columns, which may not be intended:

import pandas as pd

df = pd.DataFrame({
'Department': ['Sales', 'Sales', 'Engineering', 'Engineering'],
'Salary': [50000, 60000, 70000, 55000]
})

# WRONG: single boolean applies to ALL columns
# This sorts both Department and Salary in descending order
result = df.sort_values(by=['Department', 'Salary'], ascending=False)
print(result)

Output:

    Department  Salary
1 Sales 60000
0 Sales 50000
2 Engineering 70000
3 Engineering 55000

If the intent was to sort by Department ascending and Salary descending, the result above is incorrect.

The correct approach:

import pandas as pd

df = pd.DataFrame({
'Department': ['Sales', 'Sales', 'Engineering', 'Engineering'],
'Salary': [50000, 60000, 70000, 55000]
})

# CORRECT: specify ascending/descending for each column individually
result = df.sort_values(by=['Department', 'Salary'], ascending=[True, False])
print(result)

Output:

    Department  Salary
2 Engineering 70000
3 Engineering 55000
1 Sales 60000
0 Sales 50000

Method Comparison

MethodSorts ByMulti-ColumnNaN HandlingBest For
sort_values()Column valuesYesConfigurable (na_position)General-purpose sorting - recommended
nlargest()Column values (descending)Yes (tiebreaker)Ignores NaNGetting top N rows efficiently
nsmallest()Column values (ascending)Yes (tiebreaker)Ignores NaNGetting bottom N rows efficiently
sort_index()Index valuesYes (MultiIndex)N/AReordering by row index
np.argsort()Single column (NumPy)NoPushes to endPerformance-critical, single-column sorting

For most multi-column sorting tasks, sort_values() is the recommended method. It offers the most flexibility with independent sort directions, configurable NaN placement, and clean syntax for any number of columns.