How to Sort a CSV File by Multiple Columns in Python
Sorting CSV data by multiple columns is a common task in data analysis: for example, sorting sales records first by region and then by revenue within each region, or ordering student data by grade and then by name. Python's Pandas library makes this straightforward by loading the CSV into a DataFrame and using the sort_values() method.
This guide covers sorting by one or more columns with mixed ascending and descending orders.
Understanding sort_values() Syntax
DataFrame.sort_values(
by,
axis=0,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
ignore_index=False
)
| Parameter | Description | Default |
|---|---|---|
by | Column name or list of column names to sort by | Required |
axis | 0 to sort rows | 0 |
ascending | True for ascending, False for descending. Pass a list for mixed orders | True |
inplace | If True, modifies the DataFrame directly | False |
na_position | Where to place NaN values: 'first' or 'last' | 'last' |
ignore_index | If True, resets the index after sorting | False |
Sample CSV File
The examples use a CSV file called products.csv with the following content:
Product,Category,Price,Rating
Laptop,Electronics,999.99,4.5
Mouse,Electronics,29.99,4.2
Desk,Furniture,249.99,4.7
Chair,Furniture,199.99,4.3
Keyboard,Electronics,79.99,4.6
Bookshelf,Furniture,149.99,4.1
Monitor,Electronics,399.99,4.8
Lamp,Furniture,49.99,3.9
Sorting by a Single Column
Start by loading the CSV and sorting by one column:
import pandas as pd
data = pd.read_csv('products.csv')
# Sort by Price in ascending order
sorted_data = data.sort_values(by='Price')
print(sorted_data)
Output:
Product Category Price Rating
1 Mouse Electronics 29.99 4.2
7 Lamp Furniture 49.99 3.9
4 Keyboard Electronics 79.99 4.6
5 Bookshelf Furniture 149.99 4.1
3 Chair Furniture 199.99 4.3
2 Desk Furniture 249.99 4.7
6 Monitor Electronics 399.99 4.8
0 Laptop Electronics 999.99 4.5
When sorting by a single column, you can pass the column name directly as a string: sort_values(by='Price'). For multiple columns, pass a list: sort_values(by=['col1', 'col2']).
Sorting by Multiple Columns (Ascending)
To sort by more than one column, pass a list of column names. The DataFrame is sorted by the first column in the list, and ties are broken by the second column, and so on:
import pandas as pd
data = pd.read_csv('products.csv')
# Sort by Category first, then by Price within each category
sorted_data = data.sort_values(by=['Category', 'Price'], ignore_index=True)
print(sorted_data)
Output:
Product Category Price Rating
0 Mouse Electronics 29.99 4.2
1 Keyboard Electronics 79.99 4.6
2 Monitor Electronics 399.99 4.8
3 Laptop Electronics 999.99 4.5
4 Lamp Furniture 49.99 3.9
5 Bookshelf Furniture 149.99 4.1
6 Chair Furniture 199.99 4.3
7 Desk Furniture 249.99 4.7
Products are grouped by Category (Electronics first, then Furniture), and within each category, they are sorted by Price from lowest to highest.
Sorting with Mixed Ascending and Descending Orders
Pass a list of booleans to the ascending parameter to control the direction for each column independently:
import pandas as pd
data = pd.read_csv('products.csv')
# Sort by Category ascending, then by Price descending within each category
sorted_data = data.sort_values(
by=['Category', 'Price'],
ascending=[True, False],
ignore_index=True
)
print(sorted_data)
Output:
Product Category Price Rating
0 Laptop Electronics 999.99 4.5
1 Monitor Electronics 399.99 4.8
2 Keyboard Electronics 79.99 4.6
3 Mouse Electronics 29.99 4.2
4 Desk Furniture 249.99 4.7
5 Chair Furniture 199.99 4.3
6 Bookshelf Furniture 149.99 4.1
7 Lamp Furniture 49.99 3.9
Within each category, products are now sorted from most expensive to least expensive.
Sorting by Three or More Columns
You can sort by as many columns as needed. Each additional column acts as a further tiebreaker:
import pandas as pd
data = pd.read_csv('products.csv')
# Sort by Category (asc), Rating (desc), then Price (asc)
sorted_data = data.sort_values(
by=['Category', 'Rating', 'Price'],
ascending=[True, False, True],
ignore_index=True
)
print(sorted_data)
Output:
Product Category Price Rating
0 Monitor Electronics 399.99 4.8
1 Keyboard Electronics 79.99 4.6
2 Laptop Electronics 999.99 4.5
3 Mouse Electronics 29.99 4.2
4 Desk Furniture 249.99 4.7
5 Chair Furniture 199.99 4.3
6 Bookshelf Furniture 149.99 4.1
7 Lamp Furniture 49.99 3.9
Within Electronics, products are sorted by highest rating first. If two products had the same rating, the cheaper one would appear first.
Saving the Sorted Data Back to CSV
After sorting, you can write the result to a new CSV file:
import pandas as pd
data = pd.read_csv('products.csv')
sorted_data = data.sort_values(
by=['Category', 'Price'],
ascending=[True, True],
ignore_index=True
)
# Save to a new CSV file
sorted_data.to_csv('products_sorted.csv', index=False)
print("Sorted CSV saved successfully.")
Setting index=False prevents the DataFrame index from being written as an extra column in the CSV.
Handling NaN Values During Sorting
CSV files often contain missing data. The na_position parameter controls where NaN values are placed:
import pandas as pd
import numpy as np
data = pd.DataFrame({
'Product': ['Laptop', 'Mouse', 'Desk', 'Chair', 'Keyboard'],
'Category': ['Electronics', 'Electronics', 'Furniture', None, 'Electronics'],
'Price': [999.99, 29.99, np.nan, 199.99, 79.99]
})
# NaN values placed at the beginning
sorted_first = data.sort_values(by='Price', na_position='first')
print("NaN first:")
print(sorted_first)
# NaN values placed at the end (default)
sorted_last = data.sort_values(by='Price', na_position='last')
print("\nNaN last:")
print(sorted_last)
Output:
NaN first:
Product Category Price
2 Desk Furniture NaN
1 Mouse Electronics 29.99
4 Keyboard Electronics 79.99
3 Chair None 199.99
0 Laptop Electronics 999.99
NaN last:
Product Category Price
1 Mouse Electronics 29.99
4 Keyboard Electronics 79.99
3 Chair None 199.99
0 Laptop Electronics 999.99
2 Desk Furniture NaN
Common Mistake: Mismatched ascending List Length
When sorting by multiple columns, the ascending parameter must have the same number of elements as the by parameter:
import pandas as pd
data = pd.read_csv('products.csv')
# WRONG: 2 columns but only 1 ascending value (as a list)
try:
sorted_data = data.sort_values(
by=['Category', 'Price'],
ascending=[True] # Should have 2 values
)
except ValueError as e:
print(f"ValueError: {e}")
Output:
ValueError: Length of ascending (1) != length of by (2)
The correct approach:
# CORRECT: match the number of ascending values to the number of columns
sorted_data = data.sort_values(
by=['Category', 'Price'],
ascending=[True, False] # One value per column
)
If you want the same direction for all columns, pass a single boolean (not a list): ascending=True or ascending=False. Use a list only when columns need different sort directions.
Complete Workflow: Load, Sort, and Save
Here is a complete example combining all steps:
import pandas as pd
# Step 1: Load the CSV
data = pd.read_csv('products.csv')
print("Original data:")
print(data)
# Step 2: Sort by multiple columns
sorted_data = data.sort_values(
by=['Category', 'Rating', 'Price'],
ascending=[True, False, True],
ignore_index=True
)
print("\nSorted data:")
print(sorted_data)
# Step 3: Save the sorted data
sorted_data.to_csv('products_sorted.csv', index=False)
print("\nSaved to products_sorted.csv")
Output:
Original data:
Product Category Price Rating
0 Laptop Electronics 999.99 4.5
1 Mouse Electronics 29.99 4.2
2 Desk Furniture 249.99 4.7
3 Chair Furniture 199.99 4.3
4 Keyboard Electronics 79.99 4.6
5 Bookshelf Furniture 149.99 4.1
6 Monitor Electronics 399.99 4.8
7 Lamp Furniture 49.99 3.9
Sorted data:
Product Category Price Rating
0 Monitor Electronics 399.99 4.8
1 Keyboard Electronics 79.99 4.6
2 Laptop Electronics 999.99 4.5
3 Mouse Electronics 29.99 4.2
4 Desk Furniture 249.99 4.7
5 Chair Furniture 199.99 4.3
6 Bookshelf Furniture 149.99 4.1
7 Lamp Furniture 49.99 3.9
Saved to products_sorted.csv
Quick Reference
| Goal | Code |
|---|---|
| Sort by one column (ascending) | df.sort_values(by='col') |
| Sort by one column (descending) | df.sort_values(by='col', ascending=False) |
| Sort by multiple columns (same direction) | df.sort_values(by=['col1', 'col2']) |
| Sort by multiple columns (mixed directions) | df.sort_values(by=['col1', 'col2'], ascending=[True, False]) |
| Sort and reset index | df.sort_values(by='col', ignore_index=True) |
| Sort and modify original DataFrame | df.sort_values(by='col', inplace=True) |
| Control NaN placement | df.sort_values(by='col', na_position='first') |
| Save sorted DataFrame to CSV | df.to_csv('output.csv', index=False) |
Sorting a CSV file by multiple columns in Python is a three-step process: load with read_csv(), sort with sort_values(), and optionally save with to_csv(). The key to multi-column sorting is providing a list of column names in priority order and matching ascending values for each column when mixed directions are needed.