Python Pandas: How to Change Column Data Type in Pandas
Converting columns to the correct data type is essential for proper calculations, memory efficiency, and avoiding unexpected behavior. Pandas provides several methods depending on your data quality and conversion needs.
Using astype() - Direct Conversion
The most straightforward method for clean data:
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2, 3],
'Price': ['10.5', '20.0', '15.75']
})
print(df.dtypes)
print()
# Convert single column
df['Price'] = df['Price'].astype(float)
print(df.dtypes)
Output:
ID int64
Price object
dtype: object
ID int64
Price float64
dtype: object
Convert Multiple Columns
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2, 3],
'Price': ['10.5', '20.0', '15.75'],
'Active': [1, 0, 1]
})
# Use dictionary to specify types
df = df.astype({
'ID': str,
'Price': float,
'Active': bool
})
print(df.dtypes)
Output:
ID object
Price float64
Active bool
dtype: object
Using pd.to_numeric() - Safe Conversion
Handles messy data with invalid values:
import pandas as pd
df = pd.DataFrame({
'Price': ['10.5', '20.0', 'N/A', '15.75', 'unknown']
})
# errors='coerce' converts invalid values to NaN
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
print(df)
Output:
Price
0 10.50
1 20.00
2 NaN
3 15.75
4 NaN
Error Handling Options
import pandas as pd
s = pd.Series(['1', '2', 'bad', '4'])
# 'raise' (default): Raises ValueError on invalid input
# pd.to_numeric(s, errors='raise')
# 'coerce': Invalid values become NaN
pd.to_numeric(s, errors='coerce') # [1.0, 2.0, NaN, 4.0]
# 'ignore': Returns original input if any value is invalid (however it is deprecated!)
pd.to_numeric(s, errors='ignore') # ['1', '2', 'bad', '4']
warning
Pandas is moving away from silently returning the original data when conversion fails. Instead, you should either:
- Use
errors='coerce', or - Use normal conversion and explicitly handle exceptions.
Option 1: Use errors='coerce' (Most Common)
import pandas as pd
s = pd.Series(['1', '2', 'bad', '4'])
converted = pd.to_numeric(s, errors='coerce')
print(converted)
Option 2: Explicit Exception Handling
try:
converted = pd.to_numeric(s)
except ValueError:
converted = s
Option 3: Convert Only Valid Rows
converted = s.copy()
mask = pd.to_numeric(s, errors='coerce').notna()
converted[mask] = pd.to_numeric(s[mask])
Using pd.to_datetime() - Date Conversion
Robust parsing for date strings:
import pandas as pd
df = pd.DataFrame({
'Date': ['2024-01-15', '2024/02/20', 'March 10, 2024']
})
df['Date'] = pd.to_datetime(df['Date'], format='mixed')
print(df.dtypes)
Output:
Date datetime64[ns]
dtype: object
Handling Date Errors
import pandas as pd
df = pd.DataFrame({
'Date': ['2024-01-15', 'invalid', '2024-03-20']
})
# Invalid dates become NaT (Not a Time)
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
print(df)
Output:
Date
0 2024-01-15
1 NaT
2 2024-03-20
Common Type Conversions
| From | To | Method |
|---|---|---|
| String → Integer | int | astype(int) or pd.to_numeric() |
| String → Float | float | astype(float) or pd.to_numeric() |
| String → DateTime | datetime64 | pd.to_datetime() |
| Integer → String | str | astype(str) |
| Float → Integer | int | astype(int) (truncates) |
| Any → Boolean | bool | astype(bool) |
| Any → Category | category | astype('category') |
Memory Optimization with Downcasting
import pandas as pd
df = pd.DataFrame({'Count': [1, 2, 3, 4, 5]})
print(f"Before: {df['Count'].dtype}") # int64 (8 bytes)
# Downcast to smallest integer type that fits
df['Count'] = pd.to_numeric(df['Count'], downcast='integer')
print(f"After: {df['Count'].dtype}") # int8 (1 byte)
Output:
Before: int64
After: int8
Handling Conversion Failures
import pandas as pd
df = pd.DataFrame({'Value': ['10', '20', 'thirty']})
# ❌ astype fails on invalid data
try:
df['Value'].astype(int)
except ValueError as e:
print(f"Error: {e}")
# ✅ to_numeric handles gracefully
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
print(df)
Output:
Error: invalid literal for int() with base 10: 'thirty'
Value
0 10.0
1 20.0
2 NaN
Quick Reference
| Method | Best For | On Invalid Data |
|---|---|---|
astype() | Clean, validated data | Raises ValueError |
pd.to_numeric() | Numeric with possible errors | NaN (with coerce) |
pd.to_datetime() | Date strings, various formats | NaT (with coerce) |
Summary
- Use
astype()for quick conversions on clean data-it's fast and explicit. - Use
pd.to_numeric()orpd.to_datetime()when handling messy real-world data where some values might be invalid; theerrors='coerce'option converts problematic values toNaN/NaTinstead of crashing your pipeline.