Python Pandas: How to Detect and Fix Mixed Data Types in Pandas DataFrames

When working with real-world data in Pandas, you will frequently encounter columns that contain a mixture of data types - for example, a column that holds both integers and strings, or numbers mixed with NaN values. These mixed-type columns can cause subtle bugs, incorrect calculations, and unexpected behavior when performing operations like sorting, aggregation, or mathematical computations.

In this guide, you will learn how to detect mixed data types in a Pandas DataFrame, understand what causes them, and apply multiple methods to fix them.

What Are Mixed Data Types?

A column has mixed data types when it contains values of more than one type. Pandas typically stores such columns with the generic object dtype, which can hold any Python object but loses the performance benefits and type safety of specific dtypes like int64 or float64.

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8]   # int, str, and float
})

print(df)
print(f"\nAge column dtype: {df['Age'].dtype}")

Output:

   Name   Age
0   Tom    10
1  Nick    15
2  Juli  14.8

Age column dtype: object

The Age column contains an integer (10), a string ('15'), and a float (14.8). Pandas stores it as object dtype because no single numeric type can represent all three values.

Common Causes of Mixed Data Types

Cause	Example
Data entry errors	Typing `"fifteen"` instead of `15` in a numeric column
Inconsistent formatting	Some cells have `"$100"` while others have `100`
Missing values	`NaN` mixed with integers forces the column to `float64`
CSV import issues	A single non-numeric value in a column makes the entire column `object`
Merged datasets	Combining DataFrames where the same column has different types

How to Detect Mixed Data Types

Method 1: Using `pd.api.types.infer_dtype()`

The infer_dtype() function examines the actual values in a column and returns a descriptive string like "string", "integer", "floating", "mixed", or "mixed-integer":

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8],
    'Score': [85.5, 90.0, 78.3]
})

for column in df.columns:
    inferred = pd.api.types.infer_dtype(df[column])
    print(f"{column}: {inferred}")

Output:

Name: string
Age: mixed-integer
Score: floating

The Age column is identified as mixed, confirming that it contains multiple data types.

Method 2: Checking Unique Types Per Column

For a more detailed view, inspect the actual Python types present in each column:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8]
})

# Get unique types in the Age column
types = df['Age'].apply(type).unique()
print("Types in Age column:", types)

Output:

Types in Age column: [<class 'int'> <class 'str'> <class 'float'>]

Method 3: Automated Detection Across All Columns

Create a reusable function that scans the entire DataFrame:

import pandas as pd

def detect_mixed_types(df):
    """Identify columns with mixed data types."""
    mixed_columns = {}
    for column in df.columns:
        inferred = pd.api.types.infer_dtype(df[column])
        if 'mixed' in inferred:
            types_found = df[column].apply(type).value_counts()
            mixed_columns[column] = {
                'inferred_type': inferred,
                'type_counts': types_found.to_dict()
            }
    return mixed_columns


df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8],
    'City': ['NYC', 100, 'LA']
})

mixed = detect_mixed_types(df)
for col, info in mixed.items():
    print(f"Column '{col}': {info['inferred_type']}")
    for dtype, count in info['type_counts'].items():
        print(f"  {dtype.__name__}: {count} values")

Output:

Column 'Age': mixed-integer
  int: 1 values
  str: 1 values
  float: 1 values
Column 'City': mixed-integer
  str: 2 values
  int: 1 values

Why df.dtypes doesn't catch mixed types

df.dtypes shows the storage dtype (object, int64, float64), not the actual types of individual values. A column with dtype object could be all strings, all mixed, or contain any Python objects - dtypes alone cannot distinguish these cases.

print(df.dtypes)
# Name    object   ← Could be pure strings OR mixed
# Age     object   ← Could be pure strings OR mixed

Use pd.api.types.infer_dtype() for accurate type detection.

How to Fix Mixed Data Types

Fix 1: Using `astype()` for Direct Type Conversion

Convert the entire column to a specific data type using astype():

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8]
})

# Convert Age to integer
df['Age'] = df['Age'].astype(int)

print(df)
print(f"\nAge dtype: {df['Age'].dtype}")

Output:

   Name  Age
0   Tom   10
1  Nick   15
2  Juli   14

Age dtype: int64

astype() will raise an error if conversion is impossible

If the column contains values that cannot be converted (like actual words), astype() will raise a ValueError:

df = pd.DataFrame({'Age': [10, 'fifteen', 14.8]})

# ❌ 'fifteen' cannot be converted to int
df['Age'] = df['Age'].astype(int)
# ValueError: invalid literal for int() with base 10: 'fifteen'

Use pd.to_numeric() with errors='coerce' instead (see Fix 2).

Fix 2: Using `pd.to_numeric()` for Safe Numeric Conversion

pd.to_numeric() attempts to convert values to numbers and provides control over how to handle unconvertible values:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Juli'],
    'Age': [10, '15', 14.8]
})

# Convert to numeric: convertible strings become numbers
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

print(df)
print(f"\nAge dtype: {df['Age'].dtype}")

Output:

   Name   Age
0   Tom  10.0
1  Nick  15.0
2  Juli  14.8

Age dtype: float64

The errors parameter controls behavior for unconvertible values:

`errors` Value	Behavior	Use When
`'raise'`	Raise an error (default)	You want to catch bad data immediately
`'coerce'`	Replace with `NaN`	You want to keep valid numbers and flag bad values
`'ignore'`	Return the column unchanged	You want to skip conversion for problematic columns

Example with errors='coerce':

import pandas as pd

df = pd.DataFrame({'Age': [10, 'fifteen', 14.8, None]})

df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
print(df)

Output:

    Age
10.0
 NaN
14.8
 NaN

The string 'fifteen' is replaced with NaN instead of causing an error.

Fix 3: Using `apply()` With Custom Logic

For complex cleaning rules, use apply() with a custom function:

import pandas as pd

df = pd.DataFrame({
    'Price': ['$100', 200, '$350.50', 'free', 150.75]
})

def clean_price(value):
    """Convert price values to float, handling various formats."""
    if isinstance(value, (int, float)):
        return float(value)
    if isinstance(value, str):
        cleaned = value.replace('$', '').replace(',', '').strip()
        try:
            return float(cleaned)
        except ValueError:
            return None  # Return None for unconvertible values
    return None

df['Price'] = df['Price'].apply(clean_price)
print(df)
print(f"\nPrice dtype: {df['Price'].dtype}")

Output:

    Price
100.00
200.00
350.50
   NaN
150.75

Price dtype: float64

Fix 4: Enforcing Types During CSV Import

Prevent mixed types from entering your DataFrame in the first place by specifying dtypes during import:

import pandas as pd

# Specify expected types when reading CSV
df = pd.read_csv(
    'data.csv',
    dtype={'Age': float, 'Name': str},
    na_values=['', 'N/A', 'null']  # Treat these as NaN
)

tip

Use the low_memory=False parameter to suppress the DtypeWarning that Pandas raises when it detects mixed types during CSV import:

df = pd.read_csv('large_file.csv', low_memory=False)

This forces Pandas to read the entire column before inferring its type, which is more accurate but uses more memory.

Complete Workflow: Detect and Fix

import pandas as pd

# Create a messy DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 42, 'Diana'],
    'Age': [25, '30', 35.5, 'unknown'],
    'Salary': ['$50000', 60000, '$75,000', None]
})

print("Before cleaning:")
for col in df.columns:
    print(f"  {col}: {pd.api.types.infer_dtype(df[col])}")

# Fix Name column: convert everything to string
df['Name'] = df['Name'].astype(str)

# Fix Age column: convert to numeric, coerce errors
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

# Fix Salary column: clean and convert
df['Salary'] = (
    df['Salary']
    .astype(str)
    .str.replace('$', '', regex=False)
    .str.replace(',', '', regex=False)
    .replace('None', pd.NA)
    .pipe(pd.to_numeric, errors='coerce')
)

print("\nAfter cleaning:")
for col in df.columns:
    print(f"  {col}: {pd.api.types.infer_dtype(df[col])}")

print(f"\n{df}")
print(f"\n{df.dtypes}")

Output:

Before cleaning:
  Name: mixed-integer
  Age: mixed-integer
  Salary: mixed-integer

After cleaning:
  Name: string
  Age: floating
  Salary: floating

    Name   Age   Salary
0  Alice  25.0  50000.0
1    Bob  30.0  60000.0
2     42  35.5  75000.0
3  Diana   NaN      NaN

Name       object
Age       float64
Salary    float64
dtype: object

Conclusion

Mixed data types in Pandas columns are a common data quality issue caused by inconsistent formatting, data entry errors, or import problems.

Detect them using pd.api.types.infer_dtype() for accurate type inference, then fix them using astype() for straightforward conversions, pd.to_numeric(errors='coerce') for safe numeric conversion that handles bad values gracefully, or apply() with custom logic for complex cleaning rules.

For best results, enforce data types at import time using dtype parameters in read_csv() to prevent mixed types from entering your DataFrame in the first place.

What Are Mixed Data Types?​

Common Causes of Mixed Data Types​

How to Detect Mixed Data Types​

Method 1: Using pd.api.types.infer_dtype()​

Method 2: Checking Unique Types Per Column​

Method 3: Automated Detection Across All Columns​

How to Fix Mixed Data Types​

Fix 1: Using astype() for Direct Type Conversion​

Fix 2: Using pd.to_numeric() for Safe Numeric Conversion​

Fix 3: Using apply() With Custom Logic​

Fix 4: Enforcing Types During CSV Import​

Complete Workflow: Detect and Fix​

Conclusion​

Table of Contents

What Are Mixed Data Types?

Common Causes of Mixed Data Types

How to Detect Mixed Data Types

Method 1: Using `pd.api.types.infer_dtype()`

Method 2: Checking Unique Types Per Column

Method 3: Automated Detection Across All Columns

How to Fix Mixed Data Types

Fix 1: Using `astype()` for Direct Type Conversion

Fix 2: Using `pd.to_numeric()` for Safe Numeric Conversion

Fix 3: Using `apply()` With Custom Logic

Fix 4: Enforcing Types During CSV Import

Complete Workflow: Detect and Fix

Conclusion