Python Pandas: How to Sort a Pandas DataFrame by Multiple Columns

Sorting a DataFrame by multiple columns is a fundamental operation in data analysis. It allows you to organize data hierarchically - for example, sorting employees first by department and then by salary within each department. Pandas offers several methods to accomplish this, from the versatile sort_values() to specialized functions like nlargest() and nsmallest().

This guide covers each approach with practical examples, outputs, and guidance on when to use which method.

Sample DataFrame

All examples use the following DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

print(df)

Output:

    Name  Age  Rank
 Ryan   20   1.0
Aiden   22   NaN
 Noah   21   8.0
Caleb   19   9.0
 Emma   17   4.0
Chloe   23   NaN

Using `sort_values()` - The Most Flexible Method

The sort_values() method is the standard way to sort a DataFrame by one or more columns. It supports independent ascending/descending order per column and configurable NaN placement.

Sorting by Multiple Columns

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Sort by 'Rank' ascending, then by 'Age' descending for ties
result = df.sort_values(
    by=['Rank', 'Age'],
    ascending=[True, False],
    na_position='last'
)
print(result)

Output:

    Name  Age  Rank
 Ryan   20   1.0
 Emma   17   4.0
 Noah   21   8.0
Caleb   19   9.0
Chloe   23   NaN
Aiden   22   NaN

Key parameters:

Parameter	Description
`by`	List of column names to sort by, in priority order
`ascending`	List of booleans - `True` for ascending, `False` for descending - one per column
`na_position`	`'last'` (default) or `'first'` - where to place `NaN` values
`ignore_index`	If `True`, resets the index to 0, 1, 2, ... after sorting

Sorting with Reset Index

When you want a clean index after sorting:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

result = df.sort_values(
    by=['Rank', 'Age'],
    ascending=[True, False],
    na_position='last',
    ignore_index=True
)
print(result)

Output:

    Name  Age  Rank
 Ryan   20   1.0
 Emma   17   4.0
 Noah   21   8.0
Caleb   19   9.0
Chloe   23   NaN
Aiden   22   NaN

tip

sort_values() is the go-to method for most sorting tasks. It handles multiple columns, mixed sort directions, and NaN values cleanly in a single call.

Using `nlargest()` - Top N Rows by Value

The nlargest() method is optimized to retrieve the top N rows with the highest values in specified columns. It is faster than sorting the entire DataFrame when you only need a few top results.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get the 3 rows with the highest 'Rank' values
top_3 = df.nlargest(3, 'Rank')
print(top_3)

Output:

    Name  Age  Rank
Caleb   19   9.0
 Noah   21   8.0
 Emma   17   4.0

NaN values are automatically ignored. You can also sort by multiple columns - nlargest uses the second column as a tiebreaker:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Top 3 by Rank, then by Age for ties
top_3 = df.nlargest(3, ['Rank', 'Age'])
print(top_3)

Output:

    Name  Age  Rank
Caleb   19   9.0
 Noah   21   8.0
 Emma   17   4.0

Using `nsmallest()` - Bottom N Rows by Value

The counterpart to nlargest(), this method retrieves the lowest N values:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get the 3 rows with the lowest 'Rank' values
bottom_3 = df.nsmallest(3, 'Rank')
print(bottom_3)

Output:

   Name  Age  Rank
Ryan   20   1.0
Emma   17   4.0
Noah   21   8.0

info

nlargest() and nsmallest() are particularly efficient for large DataFrames when you only need a small subset of top/bottom rows. They use a partial sort algorithm that avoids sorting the entire dataset.

Using `sort_index()`

The sort_index() method sorts by the DataFrame's index rather than column values. This is useful after operations that disorder the index, such as filtering, grouping, or setting a custom index.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Sort by index in descending order
result = df.sort_index(ascending=False)
print(result)

Output:

    Name  Age  Rank
Chloe   23   NaN
 Emma   17   4.0
Caleb   19   9.0
 Noah   21   8.0
Aiden   22   NaN
 Ryan   20   1.0

Sorting by a MultiIndex

sort_index() becomes powerful with a MultiIndex:

import pandas as pd

df = pd.DataFrame({
    'Score': [88, 92, 75, 81, 95, 70]
}, index=pd.MultiIndex.from_tuples([
    ('Math', 'Alice'), ('Math', 'Bob'),
    ('Science', 'Alice'), ('Science', 'Bob'),
    ('English', 'Alice'), ('English', 'Bob')
], names=['Subject', 'Student']))

result = df.sort_index(level=['Subject', 'Student'], ascending=[True, False])
print(result)

Output:

                 Score
Subject Student       
English Bob         70
        Alice       95
Math    Bob         92
        Alice       88
Science Bob         81
        Alice       75

Using `argsort()` with NumPy

For performance-critical scenarios, np.argsort() computes sorted indices directly on the underlying NumPy array, which can be faster than Pandas sorting for very large datasets:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Ryan', 'Aiden', 'Noah', 'Caleb', 'Emma', 'Chloe'],
    'Age': [20, 22, 21, 19, 17, 23],
    'Rank': [1, np.nan, 8, 9, 4, np.nan]
})

# Get sorted indices for 'Rank' column
sorted_idx = np.argsort(df['Rank'].values)
result = df.iloc[sorted_idx]
print(result)

Output:

    Name  Age  Rank
 Ryan   20   1.0
 Emma   17   4.0
 Noah   21   8.0
Caleb   19   9.0
Aiden   22   NaN
Chloe   23   NaN

warning

np.argsort() sorts by a single column only. For multi-column sorting, sort_values() is far more practical. Also, np.argsort() places NaN values at the end by default, but this behavior is not configurable - use sort_values(na_position=...) if you need control over NaN placement.

Common Mistake: Mismatched `ascending` List Length

When sorting by multiple columns, the ascending parameter must have the same length as the by parameter. Passing a single boolean applies it to all columns, which may not be intended:

import pandas as pd

df = pd.DataFrame({
    'Department': ['Sales', 'Sales', 'Engineering', 'Engineering'],
    'Salary': [50000, 60000, 70000, 55000]
})

# WRONG: single boolean applies to ALL columns
# This sorts both Department and Salary in descending order
result = df.sort_values(by=['Department', 'Salary'], ascending=False)
print(result)

Output:

    Department  Salary
      Sales   60000
      Sales   50000
Engineering   70000
Engineering   55000

If the intent was to sort by Department ascending and Salary descending, the result above is incorrect.

The correct approach:

import pandas as pd

df = pd.DataFrame({
    'Department': ['Sales', 'Sales', 'Engineering', 'Engineering'],
    'Salary': [50000, 60000, 70000, 55000]
})

# CORRECT: specify ascending/descending for each column individually
result = df.sort_values(by=['Department', 'Salary'], ascending=[True, False])
print(result)

Output:

    Department  Salary
Engineering   70000
Engineering   55000
      Sales   60000
      Sales   50000

Method Comparison

Method	Sorts By	Multi-Column	NaN Handling	Best For
`sort_values()`	Column values	Yes	Configurable (`na_position`)	General-purpose sorting - recommended
`nlargest()`	Column values (descending)	Yes (tiebreaker)	Ignores `NaN`	Getting top N rows efficiently
`nsmallest()`	Column values (ascending)	Yes (tiebreaker)	Ignores `NaN`	Getting bottom N rows efficiently
`sort_index()`	Index values	Yes (MultiIndex)	N/A	Reordering by row index
`np.argsort()`	Single column (NumPy)	No	Pushes to end	Performance-critical, single-column sorting

For most multi-column sorting tasks, sort_values() is the recommended method. It offers the most flexibility with independent sort directions, configurable NaN placement, and clean syntax for any number of columns.

Sample DataFrame​

Using sort_values() - The Most Flexible Method​

Sorting by Multiple Columns​

Sorting with Reset Index​

Using nlargest() - Top N Rows by Value​

Using nsmallest() - Bottom N Rows by Value​

Using sort_index()​

Sorting by a MultiIndex​

Using argsort() with NumPy​

Common Mistake: Mismatched ascending List Length​

Method Comparison​

Table of Contents

Sample DataFrame

Using `sort_values()` - The Most Flexible Method

Sorting by Multiple Columns

Sorting with Reset Index

Using `nlargest()` - Top N Rows by Value

Using `nsmallest()` - Bottom N Rows by Value

Using `sort_index()`

Sorting by a MultiIndex

Using `argsort()` with NumPy

Common Mistake: Mismatched `ascending` List Length

Method Comparison