How to Resolve "TypeError: cannot perform reduce with flexible type" in Python
When working with NumPy arrays that contain mixed data types (like strings and numbers together), you may encounter the error TypeError: cannot perform reduce with flexible type. This error occurs when you try to perform a mathematical operation (like mean(), sum(), or std()) on an array that contains non-numeric data. In this guide, we'll explain why this happens, show multiple ways to fix it, and help you choose the right approach for your data.
Why Does This Error Occur?
NumPy arrays are designed to hold elements of a single data type. When you create an array that contains both strings and numbers, NumPy converts everything to the most general type, usually strings (<U dtype). When you then try to compute the mean or sum, NumPy can't perform arithmetic on strings.
❌ Example that triggers the error:
import numpy as np
data = np.array([
['Alice', 'CSE', 87],
['Bob', 'ECE', 88],
['Carol', 'CSE', 72]
])
# Check the dtype: everything is a string!
print(f"Array dtype: {data.dtype}")
# Try to compute the mean of the "marks" column
print(data[:, 2].mean())
Output:
Array dtype: <U21
TypeError: the resolved dtypes are not compatible with add.reduce. Resolved (dtype('<U21'), dtype('<U21'), dtype('<U42'))
Even though column 2 looks like numbers (87, 88, 72), they've been converted to strings ('87', '88', '72') because the array also contains string values like 'Alice' and 'CSE'. You can't compute the mean of strings.
In NumPy, "flexible types" include strings (<U), bytes, and void types: data types whose size can vary. Mathematical operations like mean(), sum(), and std() don't work on these types because arithmetic on strings isn't defined.
Solutions
Solution 1: Use Pandas DataFrame (Recommended for Mixed Data)
The best approach for tabular data with mixed types is to use a Pandas DataFrame instead of a NumPy array. DataFrames handle multiple data types naturally, with each column maintaining its own type:
import pandas as pd
students = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol'],
'branch': ['CSE', 'ECE', 'CSE'],
'marks': [87, 88, 72]
})
print(students)
print(f"\nColumn dtypes:\n{students.dtypes}")
print(f"\nMean marks: {students['marks'].mean()}")
Output:
name branch marks
0 Alice CSE 87
1 Bob ECE 88
2 Carol CSE 72
Column dtypes:
name object
branch object
marks int64
dtype: object
Mean marks: 82.33333333333333
Pandas automatically keeps numeric columns as integers/floats and string columns as objects, so mathematical operations work seamlessly on numeric columns.
Computing statistics on all numeric columns at once:
print(students.describe())
Output:
marks
count 3.000000
mean 82.333333
std 8.962886
min 72.000000
25% 79.500000
50% 87.000000
75% 87.500000
max 88.000000
Solution 2: Convert the Column to a Numeric Type
If you need to stay with NumPy, extract the numeric column and convert it explicitly using .astype():
import numpy as np
data = np.array([
['Alice', 'CSE', 87],
['Bob', 'ECE', 88],
['Carol', 'CSE', 72]
])
# Extract the marks column and convert from string to float
marks = data[:, 2].astype(float)
print(f"Dtype after conversion: {marks.dtype}")
print(f"Mean marks: {marks.mean()}")
print(f"Sum: {marks.sum()}")
print(f"Std Dev: {marks.std():.2f}")
Output:
Dtype after conversion: float64
Mean marks: 82.33333333333333
Sum: 247.0
Std Dev: 7.32
.astype(float) will raise a ValueError if the column contains values that can't be converted to numbers (like actual text). Always make sure you're converting the right column.
Solution 3: Store Numeric Data Separately
If you're working primarily with NumPy, keep numeric and non-numeric data in separate arrays:
❌ Wrong: Mixing types in one array:
import numpy as np
# Everything becomes a string
data = np.array([
['Alice', 87, 3.9],
['Bob', 88, 3.7],
['Carol', 72, 3.5]
])
print(data.dtype) # <U32: all strings!
print(data[:, 1].mean()) # TypeError!
✅ Correct: Separate arrays for different types:
import numpy as np
names = np.array(['Alice', 'Bob', 'Carol'])
marks = np.array([87, 88, 72])
gpa = np.array([3.9, 3.7, 3.5])
print(f"Mean marks: {marks.mean()}")
print(f"Mean GPA: {gpa.mean():.2f}")
print(f"Highest marks: {names[marks.argmax()]} with {marks.max()}")
Output:
Mean marks: 82.33333333333333
Mean GPA: 3.70
Highest marks: Bob with 88
Solution 4: Use NumPy Structured Arrays
If you need a single NumPy array with mixed types, use a structured array that defines a dtype for each field:
import numpy as np
dt = np.dtype([
('name', 'U10'), # String up to 10 characters
('branch', 'U5'), # String up to 5 characters
('marks', 'i4') # 32-bit integer
])
students = np.array([
('Alice', 'CSE', 87),
('Bob', 'ECE', 88),
('Carol', 'CSE', 72)
], dtype=dt)
# Access the numeric field by name
print(f"Mean marks: {students['marks'].mean()}")
print(f"Max marks: {students['marks'].max()}")
print(f"All names: {students['name']}")
Output:
Mean marks: 82.33333333333333
Max marks: 88
All names: ['Alice' 'Bob' 'Carol']
Each field maintains its own data type, so mathematical operations work on numeric fields without issues.
Solution 5: Use pd.to_numeric() for Mixed or Messy Data
When your data comes from a CSV file or external source and might contain mixed types within a column, pd.to_numeric() with errors='coerce' safely converts what it can and replaces the rest with NaN:
import pandas as pd
import numpy as np
# Messy data: some "marks" are not numbers
messy_data = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol', 'Dave'],
'marks': ['87', '88', 'absent', '72'] # 'absent' can't be a number
})
# Convert to numeric, replacing non-numeric values with NaN
messy_data['marks'] = pd.to_numeric(messy_data['marks'], errors='coerce')
print(messy_data)
print(f"\nMean marks (excluding NaN): {messy_data['marks'].mean()}")
Output:
name marks
0 Alice 87.0
1 Bob 88.0
2 Carol NaN
3 Dave 72.0
Mean marks (excluding NaN): 82.33333333333333
Diagnosing the Problem
When you encounter this error, the first step is to check what dtype your array actually has:
import numpy as np
data = np.array([['Alice', 87], ['Bob', 88]])
print(f"Array dtype: {data.dtype}")
print(f"Column 1 dtype: {data[:, 1].dtype}")
Output:
Array dtype: <U21
Column 1 dtype: <U21
If the dtype shows <U (Unicode string) or object, your numeric data has been converted to strings and you need one of the fixes above.
Quick Reference: Choosing the Right Solution
| Situation | Best Solution |
|---|---|
| Tabular data with mixed types | Use Pandas DataFrame |
| Need NumPy but have one numeric column | Extract and convert with .astype(float) |
| Multiple separate numeric datasets | Use separate NumPy arrays |
| Must use single NumPy array with mixed types | Use structured arrays |
| Data from CSV with messy entries | Use pd.to_numeric(errors='coerce') |
Conclusion
The TypeError: cannot perform reduce with flexible type error occurs because NumPy converts all elements in an array to the same type, and when strings are present, everything becomes a string.
Mathematical operations like mean() and sum() can't operate on strings. The best fix for most use cases is to switch to a Pandas DataFrame, which handles mixed types naturally with each column maintaining its own dtype. If you need to stay with NumPy, either convert the specific column to a numeric type with .astype(float), keep separate arrays for different data types, or use structured arrays that support per-field dtypes.