Python Pandas: How to Select Columns in a Pandas DataFrame

Selecting specific columns from a DataFrame is one of the most frequent operations in data analysis with Pandas. Whether you need a single column for calculations, a subset for visualization, or columns matching a specific pattern or data type, Pandas provides multiple ways to accomplish this. This guide covers all major column selection methods - from simple bracket notation to advanced filtering - with clear examples and practical guidance on when to use each.

Sample DataFrame

All examples in this guide use the following DataFrame:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age  Gender  Salary
   John   25    Male   50000
  Alice   30  Female   55000
    Bob   22    Male   40000
    Eve   35  Female   70000
Charlie   28    Male   48000

Selecting a Single Column with Bracket Notation

The simplest way to select one column is by passing the column name inside square brackets. This returns a Series:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

age_column = df['Age']
print(age_column)
print(type(age_column))

Output:

  25
  30
  22
  35
  28
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>

Selecting Multiple Columns with Double Brackets

To select more than one column, pass a list of column names inside double brackets. This returns a DataFrame:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

subset = df[['Name', 'Salary']]
print(subset)
print(type(subset))

Output:

      Name  Salary
   John   50000
  Alice   55000
    Bob   40000
    Eve   70000
Charlie   48000
<class 'pandas.core.frame.DataFrame'>

tip

Notice the difference: single brackets df['Age'] return a Series, while double brackets df[['Age']] return a DataFrame with one column. This distinction matters when chaining operations that expect a specific type.

Selecting Columns with `loc[]` (Label-Based)

The loc[] accessor selects data by labels - row labels and column names. Use : to select all rows, and pass a list of column names:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

selected = df.loc[:, ['Name', 'Gender']]
print(selected)

Output:

      Name  Gender
   John    Male
  Alice  Female
    Bob    Male
    Eve  Female
Charlie    Male

Selecting a Range of Columns with `loc[]`

You can also select a contiguous range of columns by name using slice notation:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select all columns from 'Age' through 'Salary'
selected = df.loc[:, 'Age':'Salary']
print(selected)

Output:

   Age  Gender  Salary
 25    Male   50000
 30  Female   55000
 22    Male   40000
 35  Female   70000
 28    Male   48000

info

Unlike Python's standard slicing, loc[] slicing is inclusive on both ends. 'Age':'Salary' includes the 'Salary' column.

Selecting Columns with `iloc[]` (Position-Based)

The iloc[] accessor selects by integer position rather than column name. This is useful when you know the column positions but not necessarily the names:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns at positions 0 and 1 (Name and Age)
selected = df.iloc[:, [0, 1]]
print(selected)

Output:

      Name  Age
   John   25
  Alice   30
    Bob   22
    Eve   35
Charlie   28

Selecting a Range of Columns by Position

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns from position 1 to 3 (Age, Gender)
selected = df.iloc[:, 1:3]
print(selected)

Output:

   Age  Gender
 25    Male
 30  Female
 22    Male
 35  Female
 28    Male

Note that iloc[] slicing is exclusive on the end, following standard Python convention - column at position 3 (Salary) is not included.

Selecting Columns with `filter()`

The filter() method selects columns whose names match a pattern. It supports exact name lists, substring matching with like, and regular expressions with regex.

Matching by Substring

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns containing the substring 'a' (case-sensitive)
filtered = df.filter(like='a')
print(filtered)

Output:

      Name  Salary
   John   50000
  Alice   55000
    Bob   40000
    Eve   70000
Charlie   48000

Only Salary matches because like is case-sensitive and looks for lowercase 'a' in column names.

Matching by Regular Expression

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select columns that start with 'A' or 'S'
filtered = df.filter(regex='^[AS]')
print(filtered)

Output:

   Age  Salary
 25   50000
 30   55000
 22   40000
 35   70000
 28   48000

Matching by Exact Names

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select specific columns by name
filtered = df.filter(items=['Name', 'Age'])
print(filtered)

Output:

      Name  Age
   John   25
  Alice   30
    Bob   22
    Eve   35
Charlie   28

Selecting Columns by Data Type

The select_dtypes() method filters columns based on their data type. This is especially useful when you need only numeric columns for calculations or only string columns for text processing:

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select only numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print(numeric_cols)

Output:

   Age  Salary
 25   50000
 30   55000
 22   40000
 35   70000
 28   48000

Selecting Non-Numeric Columns

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select only object (string) columns
text_cols = df.select_dtypes(include=['object'])
print(text_cols)

Output:

      Name  Gender
   John    Male
  Alice  Female
    Bob    Male
    Eve  Female
Charlie    Male

Excluding Specific Types

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)

# Select all columns except numeric ones
non_numeric = df.select_dtypes(exclude=['number'])
print(non_numeric)

Output:

      Name  Gender
   John    Male
  Alice  Female
    Bob    Male
    Eve  Female
Charlie    Male

Common Mistake: Confusing Single and Double Brackets

A frequent source of errors is using single brackets when double brackets are needed, or vice versa:

# Single brackets return a Series
result = df['Age']
print(type(result))  # <class 'pandas.core.series.Series'>

# Double brackets return a DataFrame
result = df[['Age']]
print(type(result))  # <class 'pandas.core.frame.DataFrame'>

This matters when you try to select multiple columns with single brackets:

# WRONG: this tries to find a single column named ('Age', 'Salary')
try:
    result = df['Age', 'Salary']
except KeyError as e:
    print(f"KeyError: {e}")

Output:

KeyError: ('Age', 'Salary')

The correct approach:

# CORRECT: pass a list inside the brackets
result = df[['Age', 'Salary']]
print(result.head(2))

Output:

   Age  Salary
0   25   50000
1   30   55000

danger

Always use double brackets df[['col1', 'col2']] when selecting multiple columns. Single brackets df['col1', 'col2'] cause a KeyError because Pandas interprets the comma-separated values as a tuple key.

Quick Reference

Method	Syntax	Selects By	Returns	Best For
Single bracket	`df['col']`	Column name	Series	Accessing one column
Double bracket	`df[['col1', 'col2']]`	Column names	DataFrame	Selecting multiple columns by name
`loc[]`	`df.loc[:, ['col1', 'col2']]`	Labels	DataFrame	Label-based selection, column ranges
`iloc[]`	`df.iloc[:, [0, 1]]`	Integer positions	DataFrame	Position-based selection
`filter()`	`df.filter(like='pattern')`	Name pattern	DataFrame	Substring or regex matching
`select_dtypes()`	`df.select_dtypes(include=['number'])`	Data type	DataFrame	Selecting by column type

Each method serves a specific purpose.

Use bracket notation for quick, straightforward selection.
Use loc[] and iloc[] when you need precise control over both rows and columns.
Use filter() for pattern-based selection and select_dtypes() when working with columns of specific data types.

Sample DataFrame​

Selecting a Single Column with Bracket Notation​

Selecting Multiple Columns with Double Brackets​

Selecting Columns with loc[] (Label-Based)​

Selecting a Range of Columns with loc[]​

Selecting Columns with iloc[] (Position-Based)​

Selecting a Range of Columns by Position​

Selecting Columns with filter()​

Matching by Substring​

Matching by Regular Expression​

Matching by Exact Names​

Selecting Columns by Data Type​

Selecting Non-Numeric Columns​

Excluding Specific Types​

Common Mistake: Confusing Single and Double Brackets​

Quick Reference​

Table of Contents

Sample DataFrame

Selecting a Single Column with Bracket Notation

Selecting Multiple Columns with Double Brackets

Selecting Columns with `loc[]` (Label-Based)

Selecting a Range of Columns with `loc[]`

Selecting Columns with `iloc[]` (Position-Based)

Selecting a Range of Columns by Position

Selecting Columns with `filter()`

Matching by Substring

Matching by Regular Expression

Matching by Exact Names

Selecting Columns by Data Type

Selecting Non-Numeric Columns

Excluding Specific Types

Common Mistake: Confusing Single and Double Brackets

Quick Reference