Skip to main content

Python Pandas: How to Detect and Fix Mixed Data Types in Pandas DataFrames

When working with real-world data in Pandas, you will frequently encounter columns that contain a mixture of data types - for example, a column that holds both integers and strings, or numbers mixed with NaN values. These mixed-type columns can cause subtle bugs, incorrect calculations, and unexpected behavior when performing operations like sorting, aggregation, or mathematical computations.

In this guide, you will learn how to detect mixed data types in a Pandas DataFrame, understand what causes them, and apply multiple methods to fix them.

What Are Mixed Data Types?

A column has mixed data types when it contains values of more than one type. Pandas typically stores such columns with the generic object dtype, which can hold any Python object but loses the performance benefits and type safety of specific dtypes like int64 or float64.

import pandas as pd

df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8] # int, str, and float
})

print(df)
print(f"\nAge column dtype: {df['Age'].dtype}")

Output:

   Name   Age
0 Tom 10
1 Nick 15
2 Juli 14.8

Age column dtype: object

The Age column contains an integer (10), a string ('15'), and a float (14.8). Pandas stores it as object dtype because no single numeric type can represent all three values.

Common Causes of Mixed Data Types

CauseExample
Data entry errorsTyping "fifteen" instead of 15 in a numeric column
Inconsistent formattingSome cells have "$100" while others have 100
Missing valuesNaN mixed with integers forces the column to float64
CSV import issuesA single non-numeric value in a column makes the entire column object
Merged datasetsCombining DataFrames where the same column has different types

How to Detect Mixed Data Types

Method 1: Using pd.api.types.infer_dtype()

The infer_dtype() function examines the actual values in a column and returns a descriptive string like "string", "integer", "floating", "mixed", or "mixed-integer":

import pandas as pd

df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8],
'Score': [85.5, 90.0, 78.3]
})

for column in df.columns:
inferred = pd.api.types.infer_dtype(df[column])
print(f"{column}: {inferred}")

Output:

Name: string
Age: mixed-integer
Score: floating

The Age column is identified as mixed, confirming that it contains multiple data types.

Method 2: Checking Unique Types Per Column

For a more detailed view, inspect the actual Python types present in each column:

import pandas as pd

df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})

# Get unique types in the Age column
types = df['Age'].apply(type).unique()
print("Types in Age column:", types)

Output:

Types in Age column: [<class 'int'> <class 'str'> <class 'float'>]

Method 3: Automated Detection Across All Columns

Create a reusable function that scans the entire DataFrame:

import pandas as pd

def detect_mixed_types(df):
"""Identify columns with mixed data types."""
mixed_columns = {}
for column in df.columns:
inferred = pd.api.types.infer_dtype(df[column])
if 'mixed' in inferred:
types_found = df[column].apply(type).value_counts()
mixed_columns[column] = {
'inferred_type': inferred,
'type_counts': types_found.to_dict()
}
return mixed_columns


df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8],
'City': ['NYC', 100, 'LA']
})

mixed = detect_mixed_types(df)
for col, info in mixed.items():
print(f"Column '{col}': {info['inferred_type']}")
for dtype, count in info['type_counts'].items():
print(f" {dtype.__name__}: {count} values")

Output:

Column 'Age': mixed-integer
int: 1 values
str: 1 values
float: 1 values
Column 'City': mixed-integer
str: 2 values
int: 1 values
Why df.dtypes doesn't catch mixed types

df.dtypes shows the storage dtype (object, int64, float64), not the actual types of individual values. A column with dtype object could be all strings, all mixed, or contain any Python objects - dtypes alone cannot distinguish these cases.

print(df.dtypes)
# Name object ← Could be pure strings OR mixed
# Age object ← Could be pure strings OR mixed

Use pd.api.types.infer_dtype() for accurate type detection.

How to Fix Mixed Data Types

Fix 1: Using astype() for Direct Type Conversion

Convert the entire column to a specific data type using astype():

import pandas as pd

df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})

# Convert Age to integer
df['Age'] = df['Age'].astype(int)

print(df)
print(f"\nAge dtype: {df['Age'].dtype}")

Output:

   Name  Age
0 Tom 10
1 Nick 15
2 Juli 14

Age dtype: int64
astype() will raise an error if conversion is impossible

If the column contains values that cannot be converted (like actual words), astype() will raise a ValueError:

df = pd.DataFrame({'Age': [10, 'fifteen', 14.8]})

# ❌ 'fifteen' cannot be converted to int
df['Age'] = df['Age'].astype(int)
# ValueError: invalid literal for int() with base 10: 'fifteen'

Use pd.to_numeric() with errors='coerce' instead (see Fix 2).

Fix 2: Using pd.to_numeric() for Safe Numeric Conversion

pd.to_numeric() attempts to convert values to numbers and provides control over how to handle unconvertible values:

import pandas as pd

df = pd.DataFrame({
'Name': ['Tom', 'Nick', 'Juli'],
'Age': [10, '15', 14.8]
})

# Convert to numeric: convertible strings become numbers
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

print(df)
print(f"\nAge dtype: {df['Age'].dtype}")

Output:

   Name   Age
0 Tom 10.0
1 Nick 15.0
2 Juli 14.8

Age dtype: float64

The errors parameter controls behavior for unconvertible values:

errors ValueBehaviorUse When
'raise'Raise an error (default)You want to catch bad data immediately
'coerce'Replace with NaNYou want to keep valid numbers and flag bad values
'ignore'Return the column unchangedYou want to skip conversion for problematic columns

Example with errors='coerce':

import pandas as pd

df = pd.DataFrame({'Age': [10, 'fifteen', 14.8, None]})

df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
print(df)

Output:

    Age
0 10.0
1 NaN
2 14.8
3 NaN

The string 'fifteen' is replaced with NaN instead of causing an error.

Fix 3: Using apply() With Custom Logic

For complex cleaning rules, use apply() with a custom function:

import pandas as pd

df = pd.DataFrame({
'Price': ['$100', 200, '$350.50', 'free', 150.75]
})

def clean_price(value):
"""Convert price values to float, handling various formats."""
if isinstance(value, (int, float)):
return float(value)
if isinstance(value, str):
cleaned = value.replace('$', '').replace(',', '').strip()
try:
return float(cleaned)
except ValueError:
return None # Return None for unconvertible values
return None

df['Price'] = df['Price'].apply(clean_price)
print(df)
print(f"\nPrice dtype: {df['Price'].dtype}")

Output:

    Price
0 100.00
1 200.00
2 350.50
3 NaN
4 150.75

Price dtype: float64

Fix 4: Enforcing Types During CSV Import

Prevent mixed types from entering your DataFrame in the first place by specifying dtypes during import:

import pandas as pd

# Specify expected types when reading CSV
df = pd.read_csv(
'data.csv',
dtype={'Age': float, 'Name': str},
na_values=['', 'N/A', 'null'] # Treat these as NaN
)
tip

Use the low_memory=False parameter to suppress the DtypeWarning that Pandas raises when it detects mixed types during CSV import:

df = pd.read_csv('large_file.csv', low_memory=False)

This forces Pandas to read the entire column before inferring its type, which is more accurate but uses more memory.

Complete Workflow: Detect and Fix

import pandas as pd

# Create a messy DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 42, 'Diana'],
'Age': [25, '30', 35.5, 'unknown'],
'Salary': ['$50000', 60000, '$75,000', None]
})

print("Before cleaning:")
for col in df.columns:
print(f" {col}: {pd.api.types.infer_dtype(df[col])}")

# Fix Name column: convert everything to string
df['Name'] = df['Name'].astype(str)

# Fix Age column: convert to numeric, coerce errors
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

# Fix Salary column: clean and convert
df['Salary'] = (
df['Salary']
.astype(str)
.str.replace('$', '', regex=False)
.str.replace(',', '', regex=False)
.replace('None', pd.NA)
.pipe(pd.to_numeric, errors='coerce')
)

print("\nAfter cleaning:")
for col in df.columns:
print(f" {col}: {pd.api.types.infer_dtype(df[col])}")

print(f"\n{df}")
print(f"\n{df.dtypes}")

Output:

Before cleaning:
Name: mixed-integer
Age: mixed-integer
Salary: mixed-integer

After cleaning:
Name: string
Age: floating
Salary: floating

Name Age Salary
0 Alice 25.0 50000.0
1 Bob 30.0 60000.0
2 42 35.5 75000.0
3 Diana NaN NaN

Name object
Age float64
Salary float64
dtype: object

Conclusion

Mixed data types in Pandas columns are a common data quality issue caused by inconsistent formatting, data entry errors, or import problems.

Detect them using pd.api.types.infer_dtype() for accurate type inference, then fix them using astype() for straightforward conversions, pd.to_numeric(errors='coerce') for safe numeric conversion that handles bad values gracefully, or apply() with custom logic for complex cleaning rules.

For best results, enforce data types at import time using dtype parameters in read_csv() to prevent mixed types from entering your DataFrame in the first place.