Skip to main content

Python Pandas: How to Select Columns in a Pandas DataFrame

Selecting specific columns from a DataFrame is one of the most frequent operations in data analysis with Pandas. Whether you need a single column for calculations, a subset for visualization, or columns matching a specific pattern or data type, Pandas provides multiple ways to accomplish this. This guide covers all major column selection methods - from simple bracket notation to advanced filtering - with clear examples and practical guidance on when to use each.

Sample DataFrame

All examples in this guide use the following DataFrame:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age  Gender  Salary
0 John 25 Male 50000
1 Alice 30 Female 55000
2 Bob 22 Male 40000
3 Eve 35 Female 70000
4 Charlie 28 Male 48000

Selecting a Single Column with Bracket Notation

The simplest way to select one column is by passing the column name inside square brackets. This returns a Series:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

age_column = df['Age']
print(age_column)
print(type(age_column))

Output:

0    25
1 30
2 22
3 35
4 28
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>

Selecting Multiple Columns with Double Brackets

To select more than one column, pass a list of column names inside double brackets. This returns a DataFrame:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

subset = df[['Name', 'Salary']]
print(subset)
print(type(subset))

Output:

      Name  Salary
0 John 50000
1 Alice 55000
2 Bob 40000
3 Eve 70000
4 Charlie 48000
<class 'pandas.core.frame.DataFrame'>
tip

Notice the difference: single brackets df['Age'] return a Series, while double brackets df[['Age']] return a DataFrame with one column. This distinction matters when chaining operations that expect a specific type.

Selecting Columns with loc[] (Label-Based)

The loc[] accessor selects data by labels - row labels and column names. Use : to select all rows, and pass a list of column names:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

selected = df.loc[:, ['Name', 'Gender']]
print(selected)

Output:

      Name  Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male

Selecting a Range of Columns with loc[]

You can also select a contiguous range of columns by name using slice notation:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select all columns from 'Age' through 'Salary'
selected = df.loc[:, 'Age':'Salary']
print(selected)

Output:

   Age  Gender  Salary
0 25 Male 50000
1 30 Female 55000
2 22 Male 40000
3 35 Female 70000
4 28 Male 48000
info

Unlike Python's standard slicing, loc[] slicing is inclusive on both ends. 'Age':'Salary' includes the 'Salary' column.

Selecting Columns with iloc[] (Position-Based)

The iloc[] accessor selects by integer position rather than column name. This is useful when you know the column positions but not necessarily the names:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns at positions 0 and 1 (Name and Age)
selected = df.iloc[:, [0, 1]]
print(selected)

Output:

      Name  Age
0 John 25
1 Alice 30
2 Bob 22
3 Eve 35
4 Charlie 28

Selecting a Range of Columns by Position

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns from position 1 to 3 (Age, Gender)
selected = df.iloc[:, 1:3]
print(selected)

Output:

   Age  Gender
0 25 Male
1 30 Female
2 22 Male
3 35 Female
4 28 Male

Note that iloc[] slicing is exclusive on the end, following standard Python convention - column at position 3 (Salary) is not included.

Selecting Columns with filter()

The filter() method selects columns whose names match a pattern. It supports exact name lists, substring matching with like, and regular expressions with regex.

Matching by Substring

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns containing the substring 'a' (case-sensitive)
filtered = df.filter(like='a')
print(filtered)

Output:

      Name  Salary
0 John 50000
1 Alice 55000
2 Bob 40000
3 Eve 70000
4 Charlie 48000

Only Salary matches because like is case-sensitive and looks for lowercase 'a' in column names.

Matching by Regular Expression

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns that start with 'A' or 'S'
filtered = df.filter(regex='^[AS]')
print(filtered)

Output:

   Age  Salary
0 25 50000
1 30 55000
2 22 40000
3 35 70000
4 28 48000

Matching by Exact Names

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select specific columns by name
filtered = df.filter(items=['Name', 'Age'])
print(filtered)

Output:

      Name  Age
0 John 25
1 Alice 30
2 Bob 22
3 Eve 35
4 Charlie 28

Selecting Columns by Data Type

The select_dtypes() method filters columns based on their data type. This is especially useful when you need only numeric columns for calculations or only string columns for text processing:

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select only numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print(numeric_cols)

Output:

   Age  Salary
0 25 50000
1 30 55000
2 22 40000
3 35 70000
4 28 48000

Selecting Non-Numeric Columns

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select only object (string) columns
text_cols = df.select_dtypes(include=['object'])
print(text_cols)

Output:

      Name  Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male

Excluding Specific Types

import pandas as pd

data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select all columns except numeric ones
non_numeric = df.select_dtypes(exclude=['number'])
print(non_numeric)

Output:

      Name  Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male

Common Mistake: Confusing Single and Double Brackets

A frequent source of errors is using single brackets when double brackets are needed, or vice versa:

# Single brackets return a Series
result = df['Age']
print(type(result)) # <class 'pandas.core.series.Series'>

# Double brackets return a DataFrame
result = df[['Age']]
print(type(result)) # <class 'pandas.core.frame.DataFrame'>

This matters when you try to select multiple columns with single brackets:

# WRONG: this tries to find a single column named ('Age', 'Salary')
try:
result = df['Age', 'Salary']
except KeyError as e:
print(f"KeyError: {e}")

Output:

KeyError: ('Age', 'Salary')

The correct approach:

# CORRECT: pass a list inside the brackets
result = df[['Age', 'Salary']]
print(result.head(2))

Output:

   Age  Salary
0 25 50000
1 30 55000
danger

Always use double brackets df[['col1', 'col2']] when selecting multiple columns. Single brackets df['col1', 'col2'] cause a KeyError because Pandas interprets the comma-separated values as a tuple key.

Quick Reference

MethodSyntaxSelects ByReturnsBest For
Single bracketdf['col']Column nameSeriesAccessing one column
Double bracketdf[['col1', 'col2']]Column namesDataFrameSelecting multiple columns by name
loc[]df.loc[:, ['col1', 'col2']]LabelsDataFrameLabel-based selection, column ranges
iloc[]df.iloc[:, [0, 1]]Integer positionsDataFramePosition-based selection
filter()df.filter(like='pattern')Name patternDataFrameSubstring or regex matching
select_dtypes()df.select_dtypes(include=['number'])Data typeDataFrameSelecting by column type

Each method serves a specific purpose.

  • Use bracket notation for quick, straightforward selection.
  • Use loc[] and iloc[] when you need precise control over both rows and columns.
  • Use filter() for pattern-based selection and select_dtypes() when working with columns of specific data types.