Python Pandas: How to Detect and Fix Mixed Data Types in Pandas DataFrames
When working with real-world data in Pandas, you will frequently encounter columns that contain a mixture of data types - for example, a column that holds both integers and strings, or numbers mixed with NaN values. These mixed-type columns can cause subtle bugs, incorrect calculations, and unexpected behavior when performing operations like sorting, aggregation, or mathematical computations.
In this guide, you will learn how to detect mixed data types in a Pandas DataFrame, understand what causes them, and apply multiple methods to fix them.
What Are Mixed Data Types?
A column has mixed data types when it contains values of more than one type. Pandas typically stores such columns with the generic object dtype, which can hold any Python object but loses the performance benefits and type safety of specific dtypes like int64 or float64.
import pandas as pd
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8] # int, str, and float
})
print(df)
print(f"\nAge column dtype: {df['Age'].dtype}")
Output:
Name Age
0 Tom 10
1 Nick 15
2 Juli 14.8
Age column dtype: object
The Age column contains an integer (10), a string ('15'), and a float (14.8). Pandas stores it as object dtype because no single numeric type can represent all three values.
Common Causes of Mixed Data Types
| Cause | Example |
|---|---|
| Data entry errors | Typing "fifteen" instead of 15 in a numeric column |
| Inconsistent formatting | Some cells have "$100" while others have 100 |
| Missing values | NaN mixed with integers forces the column to float64 |
| CSV import issues | A single non-numeric value in a column makes the entire column object |
| Merged datasets | Combining DataFrames where the same column has different types |
How to Detect Mixed Data Types
Method 1: Using pd.api.types.infer_dtype()
The infer_dtype() function examines the actual values in a column and returns a descriptive string like "string", "integer", "floating", "mixed", or "mixed-integer":
import pandas as pd
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8],
'Score': [85.5, 90.0, 78.3]
})
for column in df.columns:
inferred = pd.api.types.infer_dtype(df[column])
print(f"{column}: {inferred}")
Output:
Name: string
Age: mixed-integer
Score: floating
The Age column is identified as mixed, confirming that it contains multiple data types.
Method 2: Checking Unique Types Per Column
For a more detailed view, inspect the actual Python types present in each column:
import pandas as pd
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})
# Get unique types in the Age column
types = df['Age'].apply(type).unique()
print("Types in Age column:", types)
Output:
Types in Age column: [<class 'int'> <class 'str'> <class 'float'>]
Method 3: Automated Detection Across All Columns
Create a reusable function that scans the entire DataFrame:
import pandas as pd
def detect_mixed_types(df):
"""Identify columns with mixed data types."""
mixed_columns = {}
for column in df.columns:
inferred = pd.api.types.infer_dtype(df[column])
if 'mixed' in inferred:
types_found = df[column].apply(type).value_counts()
mixed_columns[column] = {
'inferred_type': inferred,
'type_counts': types_found.to_dict()
}
return mixed_columns
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8],
'City': ['NYC', 100, 'LA']
})
mixed = detect_mixed_types(df)
for col, info in mixed.items():
print(f"Column '{col}': {info['inferred_type']}")
for dtype, count in info['type_counts'].items():
print(f" {dtype.__name__}: {count} values")
Output:
Column 'Age': mixed-integer
int: 1 values
str: 1 values
float: 1 values
Column 'City': mixed-integer
str: 2 values
int: 1 values
df.dtypes doesn't catch mixed typesdf.dtypes shows the storage dtype (object, int64, float64), not the actual types of individual values. A column with dtype object could be all strings, all mixed, or contain any Python objects - dtypes alone cannot distinguish these cases.
print(df.dtypes)
# Name object ← Could be pure strings OR mixed
# Age object ← Could be pure strings OR mixed
Use pd.api.types.infer_dtype() for accurate type detection.
How to Fix Mixed Data Types
Fix 1: Using astype() for Direct Type Conversion
Convert the entire column to a specific data type using astype():
import pandas as pd
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})
# Convert Age to integer
df['Age'] = df['Age'].astype(int)
print(df)
print(f"\nAge dtype: {df['Age'].dtype}")
Output:
Name Age
0 Tom 10
1 Nick 15
2 Juli 14
Age dtype: int64
astype() will raise an error if conversion is impossibleIf the column contains values that cannot be converted (like actual words), astype() will raise a ValueError:
df = pd.DataFrame({'Age': [10, 'fifteen', 14.8]})
# ❌ 'fifteen' cannot be converted to int
df['Age'] = df['Age'].astype(int)
# ValueError: invalid literal for int() with base 10: 'fifteen'
Use pd.to_numeric() with errors='coerce' instead (see Fix 2).
Fix 2: Using pd.to_numeric() for Safe Numeric Conversion
pd.to_numeric() attempts to convert values to numbers and provides control over how to handle unconvertible values:
import pandas as pd
df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})
# Convert to numeric: convertible strings become numbers
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
print(df)
print(f"\nAge dtype: {df['Age'].dtype}")
Output:
Name Age
0 Tom 10.0
1 Nick 15.0
2 Juli 14.8
Age dtype: float64
The errors parameter controls behavior for unconvertible values:
errors Value | Behavior | Use When |
|---|---|---|
'raise' | Raise an error (default) | You want to catch bad data immediately |
'coerce' | Replace with NaN | You want to keep valid numbers and flag bad values |
'ignore' | Return the column unchanged | You want to skip conversion for problematic columns |
Example with errors='coerce':
import pandas as pd
df = pd.DataFrame({'Age': [10, 'fifteen', 14.8, None]})
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
print(df)
Output:
Age
0 10.0
1 NaN
2 14.8
3 NaN
The string 'fifteen' is replaced with NaN instead of causing an error.
Fix 3: Using apply() With Custom Logic
For complex cleaning rules, use apply() with a custom function:
import pandas as pd
df = pd.DataFrame({
'Price': ['$100', 200, '$350.50', 'free', 150.75]
})
def clean_price(value):
"""Convert price values to float, handling various formats."""
if isinstance(value, (int, float)):
return float(value)
if isinstance(value, str):
cleaned = value.replace('$', '').replace(',', '').strip()
try:
return float(cleaned)
except ValueError:
return None # Return None for unconvertible values
return None
df['Price'] = df['Price'].apply(clean_price)
print(df)
print(f"\nPrice dtype: {df['Price'].dtype}")
Output:
Price
0 100.00
1 200.00
2 350.50
3 NaN
4 150.75
Price dtype: float64
Fix 4: Enforcing Types During CSV Import
Prevent mixed types from entering your DataFrame in the first place by specifying dtypes during import:
import pandas as pd
# Specify expected types when reading CSV
df = pd.read_csv(
'data.csv',
dtype={'Age': float, 'Name': str},
na_values=['', 'N/A', 'null'] # Treat these as NaN
)
Use the low_memory=False parameter to suppress the DtypeWarning that Pandas raises when it detects mixed types during CSV import:
df = pd.read_csv('large_file.csv', low_memory=False)
This forces Pandas to read the entire column before inferring its type, which is more accurate but uses more memory.
Complete Workflow: Detect and Fix
import pandas as pd
# Create a messy DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 42, 'Diana'],
'Age': [25, '30', 35.5, 'unknown'],
'Salary': ['$50000', 60000, '$75,000', None]
})
print("Before cleaning:")
for col in df.columns:
print(f" {col}: {pd.api.types.infer_dtype(df[col])}")
# Fix Name column: convert everything to string
df['Name'] = df['Name'].astype(str)
# Fix Age column: convert to numeric, coerce errors
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
# Fix Salary column: clean and convert
df['Salary'] = (
df['Salary']
.astype(str)
.str.replace('$', '', regex=False)
.str.replace(',', '', regex=False)
.replace('None', pd.NA)
.pipe(pd.to_numeric, errors='coerce')
)
print("\nAfter cleaning:")
for col in df.columns:
print(f" {col}: {pd.api.types.infer_dtype(df[col])}")
print(f"\n{df}")
print(f"\n{df.dtypes}")
Output:
Before cleaning:
Name: mixed-integer
Age: mixed-integer
Salary: mixed-integer
After cleaning:
Name: string
Age: floating
Salary: floating
Name Age Salary
0 Alice 25.0 50000.0
1 Bob 30.0 60000.0
2 42 35.5 75000.0
3 Diana NaN NaN
Name object
Age float64
Salary float64
dtype: object
Conclusion
Mixed data types in Pandas columns are a common data quality issue caused by inconsistent formatting, data entry errors, or import problems.
Detect them using pd.api.types.infer_dtype() for accurate type inference, then fix them using astype() for straightforward conversions, pd.to_numeric(errors='coerce') for safe numeric conversion that handles bad values gracefully, or apply() with custom logic for complex cleaning rules.
For best results, enforce data types at import time using dtype parameters in read_csv() to prevent mixed types from entering your DataFrame in the first place.