How to Convert a List of Strings to a Sorted List of Integers in Python
When reading data from files, CSVs, user input, or APIs, numbers frequently arrive as strings. Sorting these strings directly produces incorrect results because string comparison uses lexicographical (character-by-character) order rather than numerical order. Converting to integers before sorting is essential for correct results.
In this guide, you will learn how to properly convert and sort string lists into integer lists, handle invalid data gracefully, and choose the most efficient approach for your use case.
The Problem: Lexicographical vs Numerical Sorting
Sorting strings that represent numbers produces surprising and incorrect results:
data = ["10", "2", "1", "20", "100"]
# String sorting (WRONG for numbers)
print(sorted(data))
# Numeric sorting (CORRECT)
print(sorted(map(int, data)))
Output:
['1', '10', '100', '2', '20']
[1, 2, 10, 20, 100]
String comparison works character by character using ASCII values. The string "10" comes before "2" because the first character '1' has a lower ASCII value than '2'. You must convert to integers for mathematically correct sorting.
Using map() and sorted() (Recommended)
The cleanest and most efficient approach combines map() for type conversion with sorted() for ordering:
raw_data = ["10", "2", "1", "20"]
sorted_ints = sorted(map(int, raw_data))
print(sorted_ints)
Output:
[1, 2, 10, 20]
map(int, raw_data) lazily converts each string to an integer, and sorted() consumes the iterator and returns a sorted list. No intermediate list is created.
Sorting in Descending Order
raw_data = ["10", "2", "1", "20"]
sorted_desc = sorted(map(int, raw_data), reverse=True)
print(sorted_desc)
Output:
[20, 10, 2, 1]
Using a List Comprehension
A list comprehension makes the conversion step more explicit and visible:
raw_data = ["100", "25", "5", "50"]
# Step 1: Convert to integers
integers = [int(x) for x in raw_data]
# Step 2: Sort
sorted_ints = sorted(integers)
print(sorted_ints)
Output:
[5, 25, 50, 100]
This can also be written as a one-liner:
raw_data = ["100", "25", "5", "50"]
sorted_ints = sorted([int(x) for x in raw_data])
print(sorted_ints)
Output:
[5, 25, 50, 100]
In-Place Sorting for Memory Efficiency
For large datasets where memory matters, convert first and then sort in place to avoid creating an additional sorted copy:
raw_data = ["500", "5", "50", "5000"]
int_list = [int(x) for x in raw_data]
# .sort() modifies the list in place and returns None
int_list.sort()
print(int_list)
Output:
[5, 50, 500, 5000]
The difference between sorted() and .sort() is that sorted() creates and returns a new list, while .sort() modifies the existing list in place and uses no additional memory for a second list.
Sorting Strings Numerically Without Converting
If you need the output to remain as strings but want them sorted in numerical order, use the key parameter:
raw_data = ["10", "2", "1", "20"]
sorted_strings = sorted(raw_data, key=int)
print(sorted_strings)
Output:
['1', '2', '10', '20']
The elements remain as strings, but they are ordered numerically. This is useful when you need to preserve the original string format, including features like leading zeros.
Handling Invalid Data
Filtering Non-Numeric Strings with isdigit()
dirty_data = ["100", "2", "N/A", "50", "error", "25"]
clean_ints = sorted([int(x) for x in dirty_data if x.isdigit()])
print(clean_ints)
Output:
[2, 25, 50, 100]
Handling Negative Numbers and Decimals
The isdigit() method does not recognize negative signs or decimal points. Use a try/except approach instead:
def safe_int(value):
"""Safely convert a string to an integer, returning None on failure."""
try:
return int(value)
except ValueError:
return None
raw_data = ["10", "-5", "N/A", "20", "bad", "-15"]
sorted_ints = sorted([x for x in map(safe_int, raw_data) if x is not None])
print(sorted_ints)
Output:
[-15, -5, 10, 20]
Supporting Floating-Point Numbers
def safe_number(value):
"""Convert a string to a float, returning None on failure."""
try:
return float(value)
except ValueError:
return None
raw_data = ["10.5", "2", "-3.14", "invalid", "100"]
numbers = [x for x in map(safe_number, raw_data) if x is not None]
sorted_nums = sorted(numbers)
print(sorted_nums)
Output:
[-3.14, 2.0, 10.5, 100.0]
Sorting Mixed Valid and Invalid Data with a Key Function
When you want to sort a list that contains both valid numbers and non-numeric strings, placing invalid entries at the end:
data = ["10", "2", "N/A", "20", "5"]
def safe_sort_key(x):
try:
return (0, int(x)) # Valid numbers get priority (0 sorts first)
except ValueError:
return (1, x) # Invalid strings sort last
sorted_data = sorted(data, key=safe_sort_key)
print(sorted_data)
Output:
['2', '5', '10', '20', 'N/A']
The tuple (0, value) ensures all valid numbers sort before (1, string) entries, and within each group the values are sorted normally.
Handling Leading Zeros
Leading zeros in strings do not affect integer conversion, so they are handled correctly without any special logic:
raw_data = ["007", "10", "002", "100"]
sorted_ints = sorted(map(int, raw_data))
print(sorted_ints)
Output:
[2, 7, 10, 100]
If you need to preserve the leading zeros in the output, use sorted(raw_data, key=int) to keep the original string format.
Preserving Original Positions
When you need to know where each value came from after sorting, use enumerate() to track original indices:
raw_data = ["10", "2", "1", "20"]
indexed = sorted(enumerate(raw_data), key=lambda x: int(x[1]))
print("Sorted with original indices:")
for original_index, value in indexed:
print(f" Value: {int(value):>3}, originally at index {original_index}")
Output:
Sorted with original indices:
Value: 1, originally at index 2
Value: 2, originally at index 1
Value: 10, originally at index 0
Value: 20, originally at index 3
Practical Example: CSV Column Processing
A common real-world scenario is parsing and analyzing a column of data from a CSV file that contains both valid numbers and missing or invalid entries:
ages_column = ["25", "18", "42", "", "31", "N/A", "55"]
def parse_age(value):
value = value.strip()
if value.isdigit():
return int(value)
return None
# Parse, filter, and sort
valid_ages = sorted([age for age in map(parse_age, ages_column) if age is not None])
print(f"Ages: {valid_ages}")
print(f"Min: {min(valid_ages)}, Max: {max(valid_ages)}")
print(f"Count: {len(valid_ages)}")
print(f"Average: {sum(valid_ages) / len(valid_ages):.1f}")
Output:
Ages: [18, 25, 31, 42, 55]
Min: 18, Max: 55
Count: 5
Average: 34.2
Performance Comparison
import timeit
data = [str(i) for i in range(10000)]
def with_map():
return sorted(map(int, data))
def with_comprehension():
return sorted([int(x) for x in data])
def with_key():
return sorted(data, key=int)
print(f"map() + sorted(): {timeit.timeit(with_map, number=1000):.4f}s")
print(f"Comprehension + sorted(): {timeit.timeit(with_comprehension, number=1000):.4f}s")
print(f"sorted() with key=int: {timeit.timeit(with_key, number=1000):.4f}s")
Typical output:
map() + sorted(): 0.8234s
Comprehension + sorted(): 0.9456s
sorted() with key=int: 0.7123s
The key=int approach is slightly fastest because it avoids creating a separate converted list, but note that it returns strings rather than integers.
Quick Reference
| Goal | Code |
|---|---|
| Standard sort | sorted(map(int, data)) |
| Descending order | sorted(map(int, data), reverse=True) |
| Keep as strings | sorted(data, key=int) |
| Filter invalid entries | sorted(int(x) for x in data if x.isdigit()) |
| Handle negatives | Use a try/except helper function |
| In-place sort | int_list.sort() |
Conclusion
Sorting numeric strings correctly in Python always requires converting to integers or floats first. The most common and cleanest approach is sorted(map(int, data)), which handles the conversion and sorting in a single, readable expression. When the data may contain invalid entries, wrap the conversion in a helper function with error handling to filter out or handle non-numeric values gracefully. If you need the output to remain as strings while being sorted numerically, use sorted(data, key=int) instead.
Never sort numeric strings directly. Always convert to int or float first for mathematically correct ordering. Use sorted(map(int, data)) as your default approach, and add error handling when working with data from external sources that may contain invalid entries.