Python Pandas: How to Select a Subset of a DataFrame in Pandas

Selecting subsets of data is one of the most fundamental operations in data analysis. Whether you need to extract specific columns for a report, filter rows based on conditions, or isolate a precise block of data, Pandas provides powerful and flexible tools to accomplish this.

In this guide, you'll learn how to select subsets by columns, rows, or both, using methods like bracket notation, loc[], iloc[], and conditional filtering.

Setting Up the Sample Data

All examples use a sample dataset. You can follow along with any CSV file or create one:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age     Team    Salary  Experience
  Alice   28   Lakers   8500000           4
    Bob   35  Celtics  12000000          10
Charlie   42   Lakers  15000000          18
  Diana   31     Heat   9500000           7
    Eve   26  Celtics   7000000           3

Selecting Columns

Selecting a Single Column

Use bracket notation with the column name to extract a single column as a Series:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

ages = df['Age']
print(ages)
print(f"\nType: {type(ages)}")

Output:

  28
  35
  42
  31
  26
Name: Age, dtype: int64

Type: <class 'pandas.core.series.Series'>

Selecting Multiple Columns

Pass a list of column names inside brackets to select multiple columns. The result is a DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

subset = df[['Name', 'Age', 'Salary']]
print(subset)
print(f"\nType: {type(subset)}")

Output:

      Name  Age    Salary
  Alice   28   8500000
    Bob   35  12000000
Charlie   42  15000000
  Diana   31   9500000
    Eve   26   7000000

Type: <class 'pandas.core.frame.DataFrame'>

Common Mistake: Single vs Double Brackets

Using single brackets with a column name returns a Series, while double brackets return a DataFrame (even for one column):

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Series (1D)
series = df['Name']
print(type(series))     # <class 'pandas.core.series.Series'>

# DataFrame (2D)
dataframe = df[['Name']]
print(type(dataframe))  # <class 'pandas.core.frame.DataFrame'>

This distinction matters when chaining operations that expect a DataFrame input.

Selecting Rows by Condition

Single Condition

Use boolean indexing to filter rows where a condition is true:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Select players older than 30
result = df[df['Age'] > 30]
print(result)

Output:

      Name  Age     Team    Salary  Experience
    Bob   35  Celtics  12000000          10
Charlie   42   Lakers  15000000          18
  Diana   31     Heat   9500000           7

Multiple Conditions

Combine conditions using & (AND) and | (OR). Each condition must be wrapped in parentheses:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Players older than 30 AND salary above 10 million
result = df[(df['Age'] > 30) & (df['Salary'] > 10000000)]
print(result)

Output:

      Name  Age     Team    Salary  Experience
1      Bob   35  Celtics  12000000          10
2  Charlie   42   Lakers  15000000          18

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Players on the Lakers OR younger than 28
result = df[(df['Team'] == 'Lakers') | (df['Age'] < 28)]
print(result)

Output:

      Name  Age     Team    Salary  Experience
  Alice   28   Lakers   8500000           4
Charlie   42   Lakers  15000000          18
    Eve   26  Celtics   7000000           3

Using `isin()` for Multiple Values

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Select players on specific teams
teams = ['Lakers', 'Heat']
result = df[df['Team'].isin(teams)]
print(result)

Output:

      Name  Age    Team    Salary  Experience
  Alice   28  Lakers   8500000           4
Charlie   42  Lakers  15000000          18
  Diana   31    Heat   9500000           7

Selecting Rows and Columns Together

Using `loc[]` (Label-Based Selection)

loc[] selects data by labels - column names and row conditions or index labels:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Select Name column for players older than 30
result = df.loc[df['Age'] > 30, 'Name']
print(result)

Output:

      Bob
  Charlie
    Diana
Name: Name, dtype: object

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)


# Select Name and Salary for Lakers players
result = df.loc[df['Team'] == 'Lakers', ['Name', 'Salary']]
print(result)

Output:

      Name    Salary
0    Alice   8500000
2  Charlie  15000000

Using `iloc[]` (Position-Based Selection)

iloc[] selects data by integer positions - row and column indices:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Select first 3 rows and first 2 columns
result = df.iloc[:3, :2]
print(result)

Output:

      Name  Age
  Alice   28
    Bob   35
Charlie   42

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)


# Select specific rows and columns by position
result = df.iloc[[0, 2, 4], [0, 3]]
print(result)

Output:

      Name    Salary
  Alice   8500000
Charlie  15000000
    Eve   7000000

loc[] vs iloc[]

Feature	`loc[]`	`iloc[]`
Selection by	Labels (names)	Integer positions
Row selection	Condition or index label	Row index number
Column selection	Column name	Column index number
End of slice	Inclusive	Exclusive

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# loc: end is INCLUSIVE
print(df.loc[0:2, 'Name':'Team'])  # Rows 0,1,2 (Columns Name, Age, Team)

# iloc: end is EXCLUSIVE
print(df.iloc[0:2, 0:3])           # Rows 0,1 (Columns 0, 1, 2)

Output:

      Name  Age     Team
  Alice   28   Lakers
    Bob   35  Celtics
Charlie   42   Lakers
    Name  Age     Team
Alice   28   Lakers
  Bob   35  Celtics

Quick Subset Methods

`head()` - First N Rows

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# First 3 rows (default is 5)
print(df.head(3))

Output:

      Name  Age     Team    Salary  Experience
  Alice   28   Lakers   8500000           4
    Bob   35  Celtics  12000000          10
Charlie   42   Lakers  15000000          18

`tail()` - Last N Rows

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Last 2 rows
print(df.tail(2))

Output:

    Name  Age     Team   Salary  Experience
3  Diana   31     Heat  9500000           7
4    Eve   26  Celtics  7000000           3

`sample()` - Random Rows

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# 3 random rows (set random_state for reproducibility)
print(df.sample(3, random_state=42))

Output:

      Name  Age     Team    Salary  Experience
    Bob   35  Celtics  12000000          10
    Eve   26  Celtics   7000000           3
Charlie   42   Lakers  15000000          18

`nlargest()` and `nsmallest()` - Top/Bottom by Value

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# Top 3 highest salaries
print(df.nlargest(3, 'Salary'))

Output:

      Name  Age     Team    Salary  Experience
Charlie   42   Lakers  15000000          18
    Bob   35  Celtics  12000000          10
  Diana   31     Heat   9500000           7

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

# 2 youngest players
print(df.nsmallest(2, 'Age'))

Output:

    Name  Age     Team   Salary  Experience
4    Eve   26  Celtics  7000000           3
0  Alice   28   Lakers  8500000           4

Using `query()` for Readable Filtering

The query() method lets you write filter conditions as strings, which can be more readable for complex selections:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)


# Equivalent to: df[(df['Age'] > 30) & (df['Team'] == 'Lakers')]
result = df.query('Age > 30 and Team == "Lakers"')
print(result)

Output:

      Name  Age    Team    Salary  Experience
2  Charlie   42  Lakers  15000000          18

You can reference variables with @:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [28, 35, 42, 31, 26],
    'Team': ['Lakers', 'Celtics', 'Lakers', 'Heat', 'Celtics'],
    'Salary': [8500000, 12000000, 15000000, 9500000, 7000000],
    'Experience': [4, 10, 18, 7, 3]
}

df = pd.DataFrame(data)

min_salary = 9000000
result = df.query('Salary >= @min_salary')
print(result)

Output:

      Name  Age     Team    Salary  Experience
    Bob   35  Celtics  12000000          10
Charlie   42   Lakers  15000000          18
  Diana   31     Heat   9500000           7

Summary

Task	Method	Example
Select one column	`df['col']`	`df['Age']`
Select multiple columns	`df[['col1', 'col2']]`	`df[['Name', 'Age']]`
Filter rows by condition	`df[condition]`	`df[df['Age'] > 30]`
Multiple conditions	`df[(cond1) & (cond2)]`	`df[(df['Age'] > 30) & (df['Salary'] > 1e7)]`
Rows + columns by label	`df.loc[condition, cols]`	`df.loc[df['Age'] > 30, ['Name', 'Salary']]`
Rows + columns by position	`df.iloc[rows, cols]`	`df.iloc[:3, :2]`
Match from a list	`df[df['col'].isin(list)]`	`df[df['Team'].isin(['Lakers', 'Heat'])]`
String-based query	`df.query('expression')`	`df.query('Age > 30')`
First N rows	`df.head(n)`	`df.head(10)`
Last N rows	`df.tail(n)`	`df.tail(5)`
Random rows	`df.sample(n)`	`df.sample(3)`
Top N by value	`df.nlargest(n, 'col')`	`df.nlargest(3, 'Salary')`

Pandas offers multiple ways to select subsets of a DataFrame.

Use bracket notation for quick column or row filtering.
Use loc[] for label-based selection combining rows and columns
Use iloc[] for integer-position-based selection
Use query() for readable string-based filtering.

Choose the method that makes your code clearest for the task at hand.

Setting Up the Sample Data​

Selecting Columns​

Selecting a Single Column​

Selecting Multiple Columns​

Selecting Rows by Condition​

Single Condition​

Multiple Conditions​

Using isin() for Multiple Values​

Selecting Rows and Columns Together​

Using loc[] (Label-Based Selection)​

Using iloc[] (Position-Based Selection)​

Quick Subset Methods​

head() - First N Rows​

tail() - Last N Rows​

sample() - Random Rows​

nlargest() and nsmallest() - Top/Bottom by Value​

Using query() for Readable Filtering​

Summary​

Table of Contents

Setting Up the Sample Data

Selecting Columns

Selecting a Single Column

Selecting Multiple Columns

Selecting Rows by Condition

Single Condition

Multiple Conditions

Using `isin()` for Multiple Values

Selecting Rows and Columns Together

Using `loc[]` (Label-Based Selection)

Using `iloc[]` (Position-Based Selection)

Quick Subset Methods

`head()` - First N Rows

`tail()` - Last N Rows

`sample()` - Random Rows

`nlargest()` and `nsmallest()` - Top/Bottom by Value

Using `query()` for Readable Filtering

Summary