Python Pandas: How to Select Columns in a Pandas DataFrame
Selecting specific columns from a DataFrame is one of the most frequent operations in data analysis with Pandas. Whether you need a single column for calculations, a subset for visualization, or columns matching a specific pattern or data type, Pandas provides multiple ways to accomplish this. This guide covers all major column selection methods - from simple bracket notation to advanced filtering - with clear examples and practical guidance on when to use each.
Sample DataFrame
All examples in this guide use the following DataFrame:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Gender Salary
0 John 25 Male 50000
1 Alice 30 Female 55000
2 Bob 22 Male 40000
3 Eve 35 Female 70000
4 Charlie 28 Male 48000
Selecting a Single Column with Bracket Notation
The simplest way to select one column is by passing the column name inside square brackets. This returns a Series:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
age_column = df['Age']
print(age_column)
print(type(age_column))
Output:
0 25
1 30
2 22
3 35
4 28
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>
Selecting Multiple Columns with Double Brackets
To select more than one column, pass a list of column names inside double brackets. This returns a DataFrame:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
subset = df[['Name', 'Salary']]
print(subset)
print(type(subset))
Output:
Name Salary
0 John 50000
1 Alice 55000
2 Bob 40000
3 Eve 70000
4 Charlie 48000
<class 'pandas.core.frame.DataFrame'>
Notice the difference: single brackets df['Age'] return a Series, while double brackets df[['Age']] return a DataFrame with one column. This distinction matters when chaining operations that expect a specific type.
Selecting Columns with loc[] (Label-Based)
The loc[] accessor selects data by labels - row labels and column names. Use : to select all rows, and pass a list of column names:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
selected = df.loc[:, ['Name', 'Gender']]
print(selected)
Output:
Name Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male
Selecting a Range of Columns with loc[]
You can also select a contiguous range of columns by name using slice notation:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select all columns from 'Age' through 'Salary'
selected = df.loc[:, 'Age':'Salary']
print(selected)
Output:
Age Gender Salary
0 25 Male 50000
1 30 Female 55000
2 22 Male 40000
3 35 Female 70000
4 28 Male 48000
Unlike Python's standard slicing, loc[] slicing is inclusive on both ends. 'Age':'Salary' includes the 'Salary' column.
Selecting Columns with iloc[] (Position-Based)
The iloc[] accessor selects by integer position rather than column name. This is useful when you know the column positions but not necessarily the names:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select columns at positions 0 and 1 (Name and Age)
selected = df.iloc[:, [0, 1]]
print(selected)
Output:
Name Age
0 John 25
1 Alice 30
2 Bob 22
3 Eve 35
4 Charlie 28
Selecting a Range of Columns by Position
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select columns from position 1 to 3 (Age, Gender)
selected = df.iloc[:, 1:3]
print(selected)
Output:
Age Gender
0 25 Male
1 30 Female
2 22 Male
3 35 Female
4 28 Male
Note that iloc[] slicing is exclusive on the end, following standard Python convention - column at position 3 (Salary) is not included.
Selecting Columns with filter()
The filter() method selects columns whose names match a pattern. It supports exact name lists, substring matching with like, and regular expressions with regex.
Matching by Substring
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select columns containing the substring 'a' (case-sensitive)
filtered = df.filter(like='a')
print(filtered)
Output:
Name Salary
0 John 50000
1 Alice 55000
2 Bob 40000
3 Eve 70000
4 Charlie 48000
Only Salary matches because like is case-sensitive and looks for lowercase 'a' in column names.
Matching by Regular Expression
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select columns that start with 'A' or 'S'
filtered = df.filter(regex='^[AS]')
print(filtered)
Output:
Age Salary
0 25 50000
1 30 55000
2 22 40000
3 35 70000
4 28 48000
Matching by Exact Names
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select specific columns by name
filtered = df.filter(items=['Name', 'Age'])
print(filtered)
Output:
Name Age
0 John 25
1 Alice 30
2 Bob 22
3 Eve 35
4 Charlie 28
Selecting Columns by Data Type
The select_dtypes() method filters columns based on their data type. This is especially useful when you need only numeric columns for calculations or only string columns for text processing:
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select only numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print(numeric_cols)
Output:
Age Salary
0 25 50000
1 30 55000
2 22 40000
3 35 70000
4 28 48000
Selecting Non-Numeric Columns
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select only object (string) columns
text_cols = df.select_dtypes(include=['object'])
print(text_cols)
Output:
Name Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male
Excluding Specific Types
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]
}
df = pd.DataFrame(data)
# Select all columns except numeric ones
non_numeric = df.select_dtypes(exclude=['number'])
print(non_numeric)
Output:
Name Gender
0 John Male
1 Alice Female
2 Bob Male
3 Eve Female
4 Charlie Male
Common Mistake: Confusing Single and Double Brackets
A frequent source of errors is using single brackets when double brackets are needed, or vice versa:
# Single brackets return a Series
result = df['Age']
print(type(result)) # <class 'pandas.core.series.Series'>
# Double brackets return a DataFrame
result = df[['Age']]
print(type(result)) # <class 'pandas.core.frame.DataFrame'>
This matters when you try to select multiple columns with single brackets:
# WRONG: this tries to find a single column named ('Age', 'Salary')
try:
result = df['Age', 'Salary']
except KeyError as e:
print(f"KeyError: {e}")
Output:
KeyError: ('Age', 'Salary')
The correct approach:
# CORRECT: pass a list inside the brackets
result = df[['Age', 'Salary']]
print(result.head(2))
Output:
Age Salary
0 25 50000
1 30 55000
Always use double brackets df[['col1', 'col2']] when selecting multiple columns. Single brackets df['col1', 'col2'] cause a KeyError because Pandas interprets the comma-separated values as a tuple key.
Quick Reference
| Method | Syntax | Selects By | Returns | Best For |
|---|---|---|---|---|
| Single bracket | df['col'] | Column name | Series | Accessing one column |
| Double bracket | df[['col1', 'col2']] | Column names | DataFrame | Selecting multiple columns by name |
loc[] | df.loc[:, ['col1', 'col2']] | Labels | DataFrame | Label-based selection, column ranges |
iloc[] | df.iloc[:, [0, 1]] | Integer positions | DataFrame | Position-based selection |
filter() | df.filter(like='pattern') | Name pattern | DataFrame | Substring or regex matching |
select_dtypes() | df.select_dtypes(include=['number']) | Data type | DataFrame | Selecting by column type |
Each method serves a specific purpose.
- Use bracket notation for quick, straightforward selection.
- Use
loc[]andiloc[]when you need precise control over both rows and columns. - Use
filter()for pattern-based selection andselect_dtypes()when working with columns of specific data types.