Python NumPy: How to Filter a Two-Dimensional NumPy Array Based on a Condition
Filtering data in multidimensional arrays is a fundamental operation in data analysis, scientific computing, and machine learning. NumPy makes this efficient and expressive through boolean indexing, conditional expressions, and specialized functions that let you extract exactly the elements or rows you need from a 2D array.
This guide demonstrates multiple techniques to filter a two-dimensional NumPy array based on conditions, with clear examples, outputs, and explanations for each approach.
Setting Up: Creating a 2D NumPy Array
Let's start by creating sample 2D arrays that we'll use throughout this guide:
import numpy as np
# A simple 2D array (2 rows, 3 columns)
arr = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr)
Output:
[[1 2 3]
[4 5 6]]
Method 1: Boolean Indexing (Most Common)
The most straightforward and widely used way to filter a NumPy array is boolean indexing - applying a condition directly to the array to create a boolean mask, then using that mask to select elements.
Filtering Individual Elements
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Filter all elements greater than 4
result = arr[arr > 4]
print(result)
Output:
[5 6 7 8 9]
When you apply a boolean condition to a 2D array and use it directly for indexing, the result is a flattened 1D array containing only the elements that satisfy the condition. The original 2D structure is not preserved.
Understanding the Boolean Mask
The condition arr > 4 creates a boolean array of the same shape:
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
mask = arr > 4
print("Boolean mask:")
print(mask)
print("\nFiltered elements:")
print(arr[mask])
Output:
Boolean mask:
[[False False False]
[False True True]
[ True True True]]
Filtered elements:
[5 6 7 8 9]
Filtering Entire Rows Based on a Condition
To filter complete rows where a condition applies to a specific column, use the condition on that column as the row index:
import numpy as np
arr = np.array([[1, 20, 300],
[4, 50, 600],
[7, 80, 900],
[2, 10, 100]])
# Select rows where the second column (index 1) is greater than 30
filtered_rows = arr[arr[:, 1] > 30]
print(filtered_rows)
Output:
[[ 4 50 600]
[ 7 80 900]]
Here, arr[:, 1] selects all values in column 1, and arr[:, 1] > 30 creates a 1D boolean mask applied to the rows.
Method 2: Combining Multiple Conditions
You can combine conditions using & (AND), | (OR), and ~ (NOT) operators. Each individual condition must be wrapped in parentheses.
import numpy as np
arr = np.array([[10, 21, 30],
[40, 15, 60],
[70, 82, 90],
[25, 11, 45]])
# Rows where column 0 > 20 AND column 1 < 50
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]
print("AND condition:")
print(filtered)
# Rows where column 0 > 60 OR column 2 < 50
filtered = arr[(arr[:, 0] > 60) | (arr[:, 2] < 50)]
print("\nOR condition:")
print(filtered)
Output:
AND condition:
[[40 15 60]
[25 11 45]]
OR condition:
[[10 21 30]
[70 82 90]
[25 11 45]]
and Instead of &Python's and keyword does not work with NumPy arrays - you must use the bitwise & operator:
# ❌ Wrong: raises ValueError
# filtered = arr[(arr[:, 0] > 20) and (arr[:, 1] < 50)]
# ✅ Correct: use & with parentheses
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]
Method 3: Using np.where() for Conditional Selection
np.where() returns the indices where a condition is True, or performs element-wise selection between two values:
Getting Indices of Matching Elements
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Get row and column indices where values are greater than 5
row_idx, col_idx = np.where(arr > 5)
print(f"Row indices: {row_idx}")
print(f"Col indices: {col_idx}")
print(f"Values: {arr[row_idx, col_idx]}")
Output:
Row indices: [1 2 2 2]
Col indices: [2 0 1 2]
Values: [6 7 8 9]
Replacing Values Based on a Condition
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Replace values > 5 with -1, keep others unchanged
result = np.where(arr > 5, -1, arr)
print(result)
Output:
[[ 1 2 3]
[ 4 5 -1]
[-1 -1 -1]]
Method 4: Using np.all() to Filter Columns or Rows
np.all() checks whether all elements along a specified axis satisfy a given condition. This is useful when you want to filter columns (or rows) where every value meets a specific criterion.
import numpy as np
arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)
# Keep only columns where ALL values are less than 10
col_mask = np.all(arr < 10, axis=0)
print(f"\nColumn mask (all values < 10): {col_mask}")
filtered = arr[:, col_mask]
print("\nFiltered array (columns where all values < 10):")
print(filtered)
Output:
Original array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Column mask (all values < 10): [ True True False False]
Filtered array (columns where all values < 10):
[[0 1]
[4 5]
[8 9]]
Notice that column index 2 contains the value 10. Since 10 < 10 is False, that column does not satisfy the condition. Because np.all() requires every value in a column to meet the condition, the presence of a single False causes the entire column to be excluded.
You can also compute the mask directly inside the indexing expression:
filtered = arr[:, np.all(arr < 10, axis=0)]
print(filtered)
Output:
[[0 1]
[4 5]
[8 9]]
Method 5: Using np.any() to Filter Columns or Rows
np.any() tests whether at least one element along a specified axis satisfies a condition:
import numpy as np
arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)
# Keep columns where ANY value is less than 2
col_mask = np.any(arr < 2, axis=0)
print(f"\nColumn mask (any < 2): {col_mask}")
filtered = arr[:, col_mask]
print("\nFiltered columns:")
print(filtered)
Output:
Original array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Column mask (any < 2): [ True True False False]
Filtered columns:
[[0 1]
[4 5]
[8 9]]
Columns 0 and 1 are kept because they contain at least one value less than 2 (values 0 and 1 respectively).
Method 6: Using np.in1d() for Value-Based Filtering
np.in1d() tests whether each element of an array is present in a set of target values. This is useful for filtering rows that contain specific values.
import numpy as np
arr = np.array([["1", "one"],
["2", "two"],
["3", "three"],
["4", "four"],
["5", "five"]])
# Filter rows where the second column is "two" or "four"
target_values = np.array(["two", "four"])
mask = np.in1d(arr[:, 1], target_values)
filtered = arr[mask]
print(filtered)
Output:
DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
[['2' 'two']
['4' 'four']]
For newer NumPy versions, np.isin() is the recommended replacement for np.in1d():
mask = np.isin(arr[:, 1], target_values)
filtered = arr[mask]
Method 7: Using np.extract() for Element-Level Filtering
np.extract() returns elements from an array that satisfy a condition - similar to boolean indexing but as a dedicated function:
import numpy as np
arr = np.array([[10, 25, 30],
[40, 15, 60],
[70, 85, 90]])
# Extract all elements greater than 50
result = np.extract(arr > 50, arr)
print(result)
Output:
[60 70 85 90]
Practical Example: Filtering a Dataset
Here's a real-world-style example combining multiple filtering techniques:
import numpy as np
# Student data: [ID, Math Score, Science Score, Age]
students = np.array([
[101, 85, 90, 20],
[102, 72, 68, 22],
[103, 91, 95, 21],
[104, 60, 55, 23],
[105, 78, 82, 20],
[106, 95, 88, 22]
])
# Filter 1: Students with Math score > 80
high_math = students[students[:, 1] > 80]
print("High math scorers:")
print(high_math)
# Filter 2: Students with BOTH scores above 80
both_high = students[(students[:, 1] > 80) & (students[:, 2] > 80)]
print("\nHigh in both subjects:")
print(both_high)
# Filter 3: Students aged 20 OR with Math score > 90
young_or_top = students[(students[:, 3] == 20) | (students[:, 1] > 90)]
print("\nAge 20 or Math > 90:")
print(young_or_top)
Output:
High math scorers:
[[101 85 90 20]
[103 91 95 21]
[106 95 88 22]]
High in both subjects:
[[101 85 90 20]
[103 91 95 21]
[106 95 88 22]]
Age 20 or Math > 90:
[[101 85 90 20]
[103 91 95 21]
[105 78 82 20]
[106 95 88 22]]
Quick Reference: Filtering Methods
| Method | Use Case | Returns |
|---|---|---|
arr[condition] | Element or row filtering | Matching elements/rows |
arr[:, col_condition] | Column filtering | Matching columns |
np.where(condition) | Get indices or conditional replacement | Indices or new array |
np.all(condition, axis) | Filter where ALL values meet condition | Boolean mask |
np.any(condition, axis) | Filter where ANY value meets condition | Boolean mask |
np.in1d() / np.isin() | Match against a set of values | Boolean mask |
np.extract(condition, arr) | Extract matching elements | 1D array of matches |
Conclusion
NumPy provides a rich set of tools for filtering 2D arrays based on conditions:
- Boolean indexing (
arr[condition]) is the simplest and most common method for both element-level and row-level filtering. - Combine conditions with
&,|, and~operators (always with parentheses) for complex multi-condition filters. np.where()is ideal when you need indices or want to replace values conditionally.np.all()andnp.any()let you filter entire rows or columns based on aggregate conditions across an axis.np.isin()handles membership-based filtering when matching against a set of known values.
Choose the method that best matches your filtering needs - for most everyday tasks, boolean indexing with conditions applied to specific columns will be your go-to approach.