Python Pandas: How to Select Rows from a DataFrame Based on Column Values in Python
When working with tabular data in Python, one of the most common tasks is filtering rows based on the values in one or more columns. Whether you need to isolate records that meet a specific threshold, match a category, or satisfy a combination of conditions, Pandas offers several powerful and expressive ways to accomplish this.
This guide walks you through the most practical methods for selecting rows from a Pandas DataFrame based on column values, from simple boolean indexing to SQL-like queries, complete with examples and outputs.
Setting Up the Sample DataFrame
All examples in this article use the following DataFrame:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4 Eva 29 New York
Boolean Indexing (The Most Common Approach)
Boolean indexing is the most widely used technique for filtering rows in Pandas. You pass a condition - which evaluates to a Series of True/False values - directly into the bracket operator of the DataFrame.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
selected = df[df["Age"] > 25]
print(selected)
Output:
Name Age City
1 Bob 27 Los Angeles
3 David 32 Houston
4 Eva 29 New York
Every row where the condition df["Age"] > 25 is True is included in the result.
Combining Multiple Conditions
You can chain multiple conditions using the bitwise operators & (AND), | (OR), and ~ (NOT). Each individual condition must be wrapped in parentheses; otherwise Python's operator precedence will raise an error.
# ❌ This raises a ValueError
result = df[df["Age"] > 25 & df["City"] == "New York"]
Because & has higher precedence than > and ==, the expression is evaluated incorrectly.
The correct way is to wrap each condition in parentheses:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df[(df["Age"] > 25) & (df["City"] == "New York")]
print(result)
Output:
Name Age City
4 Eva 29 New York
You can also use | for OR logic:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df[(df["Age"] < 24) | (df["City"] == "Houston")]
print(result)
Output:
Name Age City
2 Charlie 22 Chicago
3 David 32 Houston
Using loc for Conditional Row Selection
The loc accessor selects rows by label or boolean array. It works just like bracket-based boolean indexing but offers an important advantage: you can simultaneously filter rows and select specific columns.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df.loc[df["City"] == "Chicago"]
print(result)
Output:
Name Age City
2 Charlie 22 Chicago
Filtering Rows and Selecting Columns at the Same Time
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df.loc[df["Age"] >= 27, ["Name", "City"]]
print(result)
Output:
Name City
1 Bob Los Angeles
3 David Houston
4 Eva New York
locUse loc when you want to filter rows and restrict columns in a single, readable expression. It also avoids the chained-indexing pitfalls that can lead to the SettingWithCopyWarning.
Using the query Method for SQL-Like Filtering
The query method lets you write filter conditions as a string expression, which can feel more natural - especially if you have a SQL background. It also tends to produce cleaner code when conditions are complex.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df.query("Age < 30")
print(result)
Output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
4 Eva 29 New York
Referencing External Variables with @
You can reference Python variables inside a query string by prefixing them with @:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
min_age = 25
target_city = "New York"
result = df.query("Age > @min_age and City == @target_city")
print(result)
Output:
Name Age City
4 Eva 29 New York
Using isin for Membership-Based Selection
When you need to check whether column values belong to a set of allowed values, the isin method is the cleanest option. It replaces long chains of OR conditions with a single, readable call.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df[df["City"].isin(["New York", "Chicago"])]
print(result)
Output:
Name Age City
0 Alice 24 New York
2 Charlie 22 Chicago
4 Eva 29 New York
To exclude certain values, negate the condition with ~:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [24, 27, 22, 32, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "New York"],
}
df = pd.DataFrame(data)
result = df[~df["City"].isin(["Los Angeles", "Houston"])]
print(result)
Output:
Name Age City
0 Alice 24 New York
2 Charlie 22 Chicago
4 Eva 29 New York
Quick Comparison of Methods
| Method | Best For | Supports Column Selection | String Syntax |
|---|---|---|---|
df[condition] | Simple, fast filtering | No | No |
df.loc[condition] | Filtering + column selection | Yes | No |
df.query("...") | Complex or SQL-like conditions | No (directly) | Yes |
df[df["col"].isin([...])] | Matching against a list of values | No | No |
Summary
Selecting rows based on column values is a foundational skill for any data analysis workflow in Python. Here is a quick recap of when to use each method:
- Boolean indexing (
df[condition]) - your go-to for straightforward filters. loc- ideal when you need to filter rows and pick specific columns in one step.query- great for readability, especially with multiple conditions or external variables.isin- the best choice when checking membership against a list of values.
All four approaches are performant for typical dataset sizes. Choose the one that makes your code the most readable and maintainable for the task at hand.