Skip to main content

Python Pandas: How to Change Column Data Type in Pandas

Converting columns to the correct data type is essential for proper calculations, memory efficiency, and avoiding unexpected behavior. Pandas provides several methods depending on your data quality and conversion needs.

Using astype() - Direct Conversion

The most straightforward method for clean data:

import pandas as pd

df = pd.DataFrame({
'ID': [1, 2, 3],
'Price': ['10.5', '20.0', '15.75']
})

print(df.dtypes)

print()

# Convert single column
df['Price'] = df['Price'].astype(float)

print(df.dtypes)

Output:

ID        int64
Price object
dtype: object

ID int64
Price float64
dtype: object

Convert Multiple Columns

import pandas as pd

df = pd.DataFrame({
'ID': [1, 2, 3],
'Price': ['10.5', '20.0', '15.75'],
'Active': [1, 0, 1]
})

# Use dictionary to specify types
df = df.astype({
'ID': str,
'Price': float,
'Active': bool
})

print(df.dtypes)

Output:

ID         object
Price float64
Active bool
dtype: object

Using pd.to_numeric() - Safe Conversion

Handles messy data with invalid values:

import pandas as pd

df = pd.DataFrame({
'Price': ['10.5', '20.0', 'N/A', '15.75', 'unknown']
})

# errors='coerce' converts invalid values to NaN
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')

print(df)

Output:

   Price
0 10.50
1 20.00
2 NaN
3 15.75
4 NaN

Error Handling Options

import pandas as pd

s = pd.Series(['1', '2', 'bad', '4'])

# 'raise' (default): Raises ValueError on invalid input
# pd.to_numeric(s, errors='raise')

# 'coerce': Invalid values become NaN
pd.to_numeric(s, errors='coerce') # [1.0, 2.0, NaN, 4.0]

# 'ignore': Returns original input if any value is invalid (however it is deprecated!)
pd.to_numeric(s, errors='ignore') # ['1', '2', 'bad', '4']
warning

Pandas is moving away from silently returning the original data when conversion fails. Instead, you should either:

  • Use errors='coerce', or
  • Use normal conversion and explicitly handle exceptions.

Option 1: Use errors='coerce' (Most Common)

import pandas as pd

s = pd.Series(['1', '2', 'bad', '4'])

converted = pd.to_numeric(s, errors='coerce')
print(converted)

Option 2: Explicit Exception Handling

try:
converted = pd.to_numeric(s)
except ValueError:
converted = s

Option 3: Convert Only Valid Rows

converted = s.copy()
mask = pd.to_numeric(s, errors='coerce').notna()
converted[mask] = pd.to_numeric(s[mask])

Using pd.to_datetime() - Date Conversion

Robust parsing for date strings:

import pandas as pd

df = pd.DataFrame({
'Date': ['2024-01-15', '2024/02/20', 'March 10, 2024']
})

df['Date'] = pd.to_datetime(df['Date'], format='mixed')

print(df.dtypes)

Output:

Date    datetime64[ns]
dtype: object

Handling Date Errors

import pandas as pd

df = pd.DataFrame({
'Date': ['2024-01-15', 'invalid', '2024-03-20']
})

# Invalid dates become NaT (Not a Time)
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

print(df)

Output:

        Date
0 2024-01-15
1 NaT
2 2024-03-20

Common Type Conversions

FromToMethod
String → Integerintastype(int) or pd.to_numeric()
String → Floatfloatastype(float) or pd.to_numeric()
String → DateTimedatetime64pd.to_datetime()
Integer → Stringstrastype(str)
Float → Integerintastype(int) (truncates)
Any → Booleanboolastype(bool)
Any → Categorycategoryastype('category')

Memory Optimization with Downcasting

import pandas as pd

df = pd.DataFrame({'Count': [1, 2, 3, 4, 5]})

print(f"Before: {df['Count'].dtype}") # int64 (8 bytes)

# Downcast to smallest integer type that fits
df['Count'] = pd.to_numeric(df['Count'], downcast='integer')

print(f"After: {df['Count'].dtype}") # int8 (1 byte)

Output:

Before: int64
After: int8

Handling Conversion Failures

import pandas as pd

df = pd.DataFrame({'Value': ['10', '20', 'thirty']})

# ❌ astype fails on invalid data
try:
df['Value'].astype(int)
except ValueError as e:
print(f"Error: {e}")

# ✅ to_numeric handles gracefully
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
print(df)

Output:

Error: invalid literal for int() with base 10: 'thirty'
Value
0 10.0
1 20.0
2 NaN

Quick Reference

MethodBest ForOn Invalid Data
astype()Clean, validated dataRaises ValueError
pd.to_numeric()Numeric with possible errorsNaN (with coerce)
pd.to_datetime()Date strings, various formatsNaT (with coerce)

Summary

  • Use astype() for quick conversions on clean data-it's fast and explicit.
  • Use pd.to_numeric() or pd.to_datetime() when handling messy real-world data where some values might be invalid; the errors='coerce' option converts problematic values to NaN/NaT instead of crashing your pipeline.