Skip to main content

Python Pandas: How to Get Column Data Types in a Pandas DataFrame in Python

When working with data in Python, understanding the data types of each column in a Pandas DataFrame is essential. Data types affect memory usage, performance, and how operations such as filtering, aggregation, and mathematical computations behave. A column stored as object (string) instead of int64 can silently break calculations, inflate memory usage, and cause unexpected results.

In this guide, you'll learn multiple ways to inspect column data types in a Pandas DataFrame, interpret the results, convert types when needed, and avoid common pitfalls.

Why Checking Data Types Matters

Before performing any data analysis, you should always verify column data types because:

  • Incorrect types lead to wrong calculations: Summing a column of string numbers produces concatenation ("1" + "2" = "12"), not addition.
  • Memory usage increases unnecessarily: Storing integers as object type uses significantly more memory.
  • Date operations require proper parsing: String dates like "2025-01-15" can't be used for time-based filtering or grouping.
  • Machine learning models require numeric inputs: Most models reject string columns without explicit conversion.

Creating a Sample DataFrame

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age    Salary           City  Is_Active
0 Alice 27 55000.50 New York True
1 Bob 24 62000.75 Chicago False
2 Charlie 22 48000.00 Dallas True
3 Diana 32 71000.25 San Francisco True

Method 1: Using the dtypes Attribute

The simplest and most common way to check column data types is the dtypes attribute:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)

print(df.dtypes)

Output:

Name          object
Age int64
Salary float64
City object
Is_Active bool
dtype: object

Understanding Common Data Types

Pandas dtypeDescriptionExample Values
objectStrings or mixed types'Alice', 'Delhi'
int6464-bit integer27, 32
float6464-bit floating-point55000.50, 62000.75
boolBooleanTrue, False
datetime64[ns]Date and time2025-01-15
categoryCategorical dataFixed set of values
Int64 (nullable)Nullable integer27, None
stringDedicated string type'Alice'

Method 2: Using info() for a Complete Overview

The info() method provides a comprehensive summary including data types, non-null counts, and memory usage:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 Salary 4 non-null float64
3 City 4 non-null object
4 Is_Active 4 non-null bool
dtypes: bool(1), float64(1), int64(1), object(2)
memory usage: 264.0+ bytes
tip

info() is especially useful after loading external data (CSV, Excel, JSON) because it shows both data types and missing value counts in a single call. This helps you spot issues immediately - like a numeric column loaded as object due to missing values.

Method 3: Checking a Single Column's Type

To check the data type of a specific column:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)

# Using dtypes
print("Age dtype:", df['Age'].dtype)

# Using the dtype attribute
print("Salary dtype:", df['Salary'].dtype)

Output:

Age dtype: int64
Salary dtype: float64

Method 4: Getting Data Types as a Dictionary

For programmatic use, convert data types to a dictionary:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)

type_dict = df.dtypes.to_dict()
print(type_dict)

Output:

{'Name': dtype('O'), 'Age': dtype('int64'), 'Salary': dtype('float64'), 'City': dtype('O'), 'Is_Active': dtype('bool')}

This is useful when you need to iterate over columns and apply logic based on their types.

Method 5: Selecting Columns by Data Type

Use select_dtypes() to filter columns based on their type:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [27, 24, 22, 32],
'Salary': [55000.50, 62000.75, 48000.00, 71000.25],
'City': ['New York', 'Chicago', 'Dallas', 'San Francisco'],
'Is_Active': [True, False, True, True]
}

df = pd.DataFrame(data)

# Select only numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print("Numeric columns:")
print(numeric_cols)

# Select only string (object) columns
string_cols = df.select_dtypes(include=['object'])
print("\nString columns:")
print(string_cols)

# Exclude boolean columns
non_bool = df.select_dtypes(exclude=['bool'])
print("\nNon-boolean columns:")
print(non_bool.columns.tolist())

Output:

Numeric columns:
Age Salary
0 27 55000.50
1 24 62000.75
2 22 48000.00
3 32 71000.25

String columns:
Name City
0 Alice New York
1 Bob Chicago
2 Charlie Dallas
3 Diana San Francisco

Non-boolean columns:
['Name', 'Age', 'Salary', 'City']

Common Mistake: Numeric Values Stored as Strings

One of the most frequent issues after loading data is having numeric columns stored as strings (object type). This happens when a CSV contains mixed values, missing data markers like "N/A", or formatting characters like commas in numbers.

import pandas as pd

data = {'Price': ['100', '200', '300', '400']}
df = pd.DataFrame(data)

print("Data type:", df['Price'].dtype)
print("Sum result:", df['Price'].sum()) # String concatenation!

Output:

Data type: object
Sum result: 100200300400

The sum() produces string concatenation ("100" + "200" + ...) instead of numeric addition.

Fix: Convert to the correct type

import pandas as pd

data = {'Price': ['100', '200', '300', '400']}
df = pd.DataFrame(data)

df['Price'] = pd.to_numeric(df['Price'])

print("Data type:", df['Price'].dtype)
print("Sum result:", df['Price'].sum())

Output:

Data type: int64
Sum result: 1000

Converting Data Types with astype()

Use astype() to explicitly convert column types:

import pandas as pd

df = pd.DataFrame({
'ID': [1, 2, 3],
'Score': [85.5, 90.0, 78.3],
'Grade': ['A', 'A', 'B']
})

print("Before:")
print(df.dtypes)

# Convert types
df['ID'] = df['ID'].astype(str)
df['Score'] = df['Score'].astype(int)
df['Grade'] = df['Grade'].astype('category')

print("\nAfter:")
print(df.dtypes)

Output:

Before:
ID int64
Score float64
Grade object
dtype: object

After:
ID object
Score int64
Grade category
dtype: object
When to use category dtype

Convert columns with a small number of repeating values (like status codes, grades, or country names) to category type. This significantly reduces memory usage:

df['Grade'] = df['Grade'].astype('category')

A column with 1 million rows but only 5 unique values can reduce memory usage by 90% or more when stored as category instead of object.

Handling Mixed Types in a Column

Sometimes a column contains mixed types (numbers and strings), which Pandas stores as object:

import pandas as pd

data = {'Value': [10, '20', 30, 'N/A', 50]}
df = pd.DataFrame(data)

print("Dtype:", df['Value'].dtype)

# Convert with error handling
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
print("\nAfter conversion:")
print(df)
print("Dtype:", df['Value'].dtype)

Output:

Dtype: object

After conversion:
Value
0 10.0
1 20.0
2 30.0
3 NaN
4 50.0
Dtype: float64

The errors='coerce' parameter converts unparseable values to NaN instead of raising an error.

Summary

MethodReturnsBest For
df.dtypesSeries of column typesQuick type inspection
df.info()Type + null counts + memoryComprehensive overview
df['col'].dtypeSingle column's typeChecking a specific column
df.dtypes.to_dict()Dictionary of typesProgrammatic access
df.select_dtypes()Filtered DataFrameSelecting columns by type
  • Always check data types after loading external data using df.dtypes or df.info().
  • Convert columns explicitly with pd.to_numeric(), pd.to_datetime(), or astype() when Pandas infers the wrong type.
  • Making data type inspection a habit will save you from subtle bugs and unexpected behavior throughout your analysis workflow.