Skip to main content

How to Split a Heterogeneous Type List in Python

Python lists can hold elements of different data types: integers, strings, floats, and more, all in the same list. When processing such mixed-type lists, you often need to separate elements by their type into individual homogeneous lists for type-specific operations like arithmetic on numbers or string manipulation on text.

In this guide, you will learn multiple methods to split a heterogeneous list by data type, from clean Pythonic approaches to flexible solutions that handle any number of types.

Understanding the Problem

Given a list containing mixed data types:

data = ["hello", 1, 2.5, "world", 3, True, "python", 4.0]

Split it into separate lists based on type:

strings:  ["hello", "world", "python"]
integers: [1, 3]
floats: [2.5, 4.0]
booleans: [True]

The most Pythonic approach uses list comprehensions to filter elements by type:

data = ["tutorialreference", 1, 2, "is", "best", 3.14]

str_list = [x for x in data if isinstance(x, str)]
int_list = [x for x in data if isinstance(x, int) and not isinstance(x, bool)]
float_list = [x for x in data if isinstance(x, float)]

print(f"Strings: {str_list}")
print(f"Integers: {int_list}")
print(f"Floats: {float_list}")

Output:

Strings:  ['tutorialreference', 'is', 'best']
Integers: [1, 2]
Floats: [3.14]
caution

In Python, bool is a subclass of int, so isinstance(True, int) returns True. If your list contains booleans and you want to separate them from integers, add an explicit check:

# Without the bool check:
data = [1, True, 2, False]
ints = [x for x in data if isinstance(x, int)]
print(ints) # [1, True, 2, False]: booleans are included!

# With the bool check:
ints = [x for x in data if isinstance(x, int) and not isinstance(x, bool)]
print(ints) # [1, 2]: booleans are excluded

Output:

[1, True, 2, False]
[1, 2]

Method 2: Using defaultdict for Automatic Type Grouping

The defaultdict approach is highly flexible; it automatically groups elements by their type without requiring you to know the types in advance:

from collections import defaultdict

data = ["tutorialreference", 1, 2, "is", "best", 3.14, True, (1, 2)]

grouped = defaultdict(list)
for item in data:
grouped[type(item)].append(item)

# Access each type group
for dtype, items in grouped.items():
print(f"{dtype.__name__:>10}: {items}")

Output:

       str: ['tutorialreference', 'is', 'best']
int: [1, 2]
float: [3.14]
bool: [True]
tuple: [(1, 2)]

How it works:

  1. defaultdict(list) creates a dictionary where each missing key automatically initializes with an empty list.
  2. type(item) returns the exact type of each element (e.g., <class 'str'>).
  3. Each element is appended to the list corresponding to its type.
tip

This is the most scalable approach: it handles any number of types without modification. You don't need to add new list comprehensions when new types appear in the data.

Accessing Specific Type Groups

from collections import defaultdict

data = ["hello", 1, 2.5, "world", 3, 4.0]

grouped = defaultdict(list)
for item in data:
grouped[type(item)].append(item)

# Access by type
print("Strings:", grouped[str])
print("Integers:", grouped[int])
print("Floats:", grouped[float])
print("Missing type:", grouped[complex]) # Returns empty list (no error)

Output:

Strings: ['hello', 'world']
Integers: [1, 3]
Floats: [2.5, 4.0]
Missing type: []

Method 3: Using filter() with lambda

The filter() function provides a functional programming approach:

data = ["tutorialreference", 1, 2, "is", "best"]

int_list = list(filter(lambda x: isinstance(x, int), data))
str_list = list(filter(lambda x: isinstance(x, str), data))

print(f"Integers: {int_list}")
print(f"Strings: {str_list}")

Output:

Integers: [1, 2]
Strings: ['tutorialreference', 'is', 'best']

Method 4: Creating a Reusable Splitter Function

For production code, encapsulate the logic in a reusable function:

from collections import defaultdict

def split_by_type(data):
"""Split a heterogeneous list into groups by data type.

Args:
data: A list containing elements of mixed types.

Returns:
A dictionary mapping type names to lists of elements.
"""
grouped = defaultdict(list)
for item in data:
grouped[type(item).__name__].append(item)
return dict(grouped)


data = ["hello", 1, 2.5, "world", 3, True, None, (1, 2)]

result = split_by_type(data)

for type_name, items in result.items():
print(f"{type_name:>10}: {items}")

Output:

       str: ['hello', 'world']
int: [1, 3]
float: [2.5]
bool: [True]
NoneType: [None]
tuple: [(1, 2)]

Splitting into Specific Types Only

def split_into(data, *types):
"""Split a list into groups for specified types only.

Args:
data: Input list with mixed types.
*types: Type classes to filter for.

Returns:
Tuple of lists, one per requested type, plus an 'other' list.
"""
results = {t: [] for t in types}
other = []

for item in data:
placed = False
for t in types:
if isinstance(item, t):
results[t].append(item)
placed = True
break
if not placed:
other.append(item)

return tuple(results[t] for t in types) + (other,)


data = ["hello", 1, 2.5, "world", 3, None, True]

strings, numbers, others = split_into(data, str, (int, float))

print(f"Strings: {strings}")
print(f"Numbers: {numbers}")
print(f"Others: {others}")

Output:

Strings: ['hello', 'world']
Numbers: [1, 2.5, 3, True]
Others: [None]
info

isinstance() accepts a tuple of types as the second argument, letting you group related types together. For example, isinstance(x, (int, float)) matches both integers and floats.

Common Mistake: Using type() Instead of isinstance()

A subtle but important distinction:

type() checks the exact type:

class MyInt(int):
pass

val = MyInt(5)
print(type(val) == int) # False: MyInt is not exactly int
print(isinstance(val, int)) # True: MyInt inherits from int

isinstance() respects inheritance, making it the better choice for type checking in most cases:

data = [1, True, 2, False]

# type(): separates bool and int
by_type = [x for x in data if type(x) == int]
print(f"type() == int: {by_type}") # [1, 2]: booleans are excluded

# isinstance(): bool is a subclass of int
by_isinstance = [x for x in data if isinstance(x, int)]
print(f"isinstance(int): {by_isinstance}") # [1, True, 2, False]: booleans are included

Output:

type() == int: [1, 2]
isinstance(int): [1, True, 2, False]

Choose based on your needs:

  • Use isinstance() when you want inclusive type checking (recommended for most cases).
  • Use type(x) == when you need exact type matching.

Performance Comparison

MethodTime ComplexityHandles Any TypeScales to N Types
List comprehensionO(n × k)✅ Yes❌ One comprehension per type
defaultdictO(n)✅ Yes✅ Automatic
filter() + lambdaO(n × k)✅ Yes❌ One filter per type
Reusable functionO(n)✅ Yes✅ Configurable

Where n = list length, k = number of types to check.

tip

The defaultdict approach is the most efficient for multiple types because it processes the list in a single pass (O(n)), while using separate list comprehensions requires one pass per type (O(n × k)).

Summary

Splitting a heterogeneous list by data type is a common task in Python data processing. Key takeaways:

  • Use list comprehension with isinstance() for splitting into 2-3 known types: clean and readable.
  • Use defaultdict(list) for automatic grouping by any number of types in a single pass: the most flexible and efficient approach.
  • Use filter() with lambda for a functional programming style.
  • Remember that bool is a subclass of int: add explicit bool checks if you need to separate them.
  • Prefer isinstance() over type() for type checking, as it respects inheritance.
  • Create a reusable function when you need this pattern frequently across your codebase.