Python Pandas: How to Get Column Data Types in a Pandas DataFrame in Python
When working with data in Python, understanding the data types of each column in a Pandas DataFrame is essential. Data types affect memory usage, performance, and how operations such as filtering, aggregation, and mathematical computations behave. A column stored as object (string) instead of int64 can silently break calculations, inflate memory usage, and cause unexpected results.
In this guide, you'll learn multiple ways to inspect column data types in a Pandas DataFrame, interpret the results, convert types when needed, and avoid common pitfalls.
Why Checking Data Types Matters
Before performing any data analysis, you should always verify column data types because:
- Incorrect types lead to wrong calculations: Summing a column of string numbers produces concatenation (
"1" + "2" = "12"), not addition. - Memory usage increases unnecessarily: Storing integers as
objecttype uses significantly more memory. - Date operations require proper parsing: String dates like
"2025-01-15"can't be used for time-based filtering or grouping. - Machine learning models require numeric inputs: Most models reject string columns without explicit conversion.
Creating a Sample DataFrame
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Salary City Is_Active
0 Alice 27 55000.50 New York True
1 Bob 24 62000.75 Chicago False
2 Charlie 22 48000.00 Dallas True
3 Diana 32 71000.25 San Francisco True
Method 1: Using the dtypes Attribute
The simplest and most common way to check column data types is the dtypes attribute:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
print(df.dtypes)
Output:
Name object
Age int64
Salary float64
City object
Is_Active bool
dtype: object
Understanding Common Data Types
| Pandas dtype | Description | Example Values |
|---|---|---|
object | Strings or mixed types | 'Alice', 'Delhi' |
int64 | 64-bit integer | 27, 32 |
float64 | 64-bit floating-point | 55000.50, 62000.75 |
bool | Boolean | True, False |
datetime64[ns] | Date and time | 2025-01-15 |
category | Categorical data | Fixed set of values |
Int64 (nullable) | Nullable integer | 27, None |
string | Dedicated string type | 'Alice' |
Method 2: Using info() for a Complete Overview
The info() method provides a comprehensive summary including data types, non-null counts, and memory usage:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 Salary 4 non-null float64
3 City 4 non-null object
4 Is_Active 4 non-null bool
dtypes: bool(1), float64(1), int64(1), object(2)
memory usage: 264.0+ bytes
info() is especially useful after loading external data (CSV, Excel, JSON) because it shows both data types and missing value counts in a single call. This helps you spot issues immediately - like a numeric column loaded as object due to missing values.
Method 3: Checking a Single Column's Type
To check the data type of a specific column:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
# Using dtypes
print("Age dtype:", df['Age'].dtype)
# Using the dtype attribute
print("Salary dtype:", df['Salary'].dtype)
Output:
Age dtype: int64
Salary dtype: float64
Method 4: Getting Data Types as a Dictionary
For programmatic use, convert data types to a dictionary:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
type_dict = df.dtypes.to_dict()
print(type_dict)
Output:
{'Name': dtype('O'), 'Age': dtype('int64'), 'Salary': dtype('float64'), 'City': dtype('O'), 'Is_Active': dtype('bool')}
This is useful when you need to iterate over columns and apply logic based on their types.
Method 5: Selecting Columns by Data Type
Use select_dtypes() to filter columns based on their type:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}
df = pd.DataFrame(data)
# Select only numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print("Numeric columns:")
print(numeric_cols)
# Select only string (object) columns
string_cols = df.select_dtypes(include=['object'])
print("\nString columns:")
print(string_cols)
# Exclude boolean columns
non_bool = df.select_dtypes(exclude=['bool'])
print("\nNon-boolean columns:")
print(non_bool.columns.tolist())
Output:
Numeric columns:
Age Salary
0 27 55000.50
1 24 62000.75
2 22 48000.00
3 32 71000.25
String columns:
Name City
0 Alice New York
1 Bob Chicago
2 Charlie Dallas
3 Diana San Francisco
Non-boolean columns:
['Name', 'Age', 'Salary', 'City']
Common Mistake: Numeric Values Stored as Strings
One of the most frequent issues after loading data is having numeric columns stored as strings (object type). This happens when a CSV contains mixed values, missing data markers like "N/A", or formatting characters like commas in numbers.
import pandas as pd
data = {'Price': ['100', '200', '300', '400']}
df = pd.DataFrame(data)
print("Data type:", df['Price'].dtype)
print("Sum result:", df['Price'].sum()) # String concatenation!
Output:
Data type: object
Sum result: 100200300400
The sum() produces string concatenation ("100" + "200" + ...) instead of numeric addition.
Fix: Convert to the correct type
import pandas as pd
data = {'Price': ['100', '200', '300', '400']}
df = pd.DataFrame(data)
df['Price'] = pd.to_numeric(df['Price'])
print("Data type:", df['Price'].dtype)
print("Sum result:", df['Price'].sum())
Output:
Data type: int64
Sum result: 1000
Converting Data Types with astype()
Use astype() to explicitly convert column types:
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2, 3],
'Score': [85.5, 90.0, 78.3],
'Grade': ['A', 'A', 'B']
})
print("Before:")
print(df.dtypes)
# Convert types
df['ID'] = df['ID'].astype(str)
df['Score'] = df['Score'].astype(int)
df['Grade'] = df['Grade'].astype('category')
print("\nAfter:")
print(df.dtypes)
Output:
Before:
ID int64
Score float64
Grade object
dtype: object
After:
ID object
Score int64
Grade category
dtype: object
category dtypeConvert columns with a small number of repeating values (like status codes, grades, or country names) to category type. This significantly reduces memory usage:
df['Grade'] = df['Grade'].astype('category')
A column with 1 million rows but only 5 unique values can reduce memory usage by 90% or more when stored as category instead of object.
Handling Mixed Types in a Column
Sometimes a column contains mixed types (numbers and strings), which Pandas stores as object:
import pandas as pd
data = {'Value': [10, '20', 30, 'N/A', 50]}
df = pd.DataFrame(data)
print("Dtype:", df['Value'].dtype)
# Convert with error handling
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
print("\nAfter conversion:")
print(df)
print("Dtype:", df['Value'].dtype)
Output:
Dtype: object
After conversion:
Value
0 10.0
1 20.0
2 30.0
3 NaN
4 50.0
Dtype: float64
The errors='coerce' parameter converts unparseable values to NaN instead of raising an error.
Summary
| Method | Returns | Best For |
|---|---|---|
df.dtypes | Series of column types | Quick type inspection |
df.info() | Type + null counts + memory | Comprehensive overview |
df['col'].dtype | Single column's type | Checking a specific column |
df.dtypes.to_dict() | Dictionary of types | Programmatic access |
df.select_dtypes() | Filtered DataFrame | Selecting columns by type |
- Always check data types after loading external data using
df.dtypesordf.info(). - Convert columns explicitly with
pd.to_numeric(),pd.to_datetime(), orastype()when Pandas infers the wrong type. - Making data type inspection a habit will save you from subtle bugs and unexpected behavior throughout your analysis workflow.