Python NumPy: How to Filter a Two-Dimensional NumPy Array Based on a Condition

Filtering data in multidimensional arrays is a fundamental operation in data analysis, scientific computing, and machine learning. NumPy makes this efficient and expressive through boolean indexing, conditional expressions, and specialized functions that let you extract exactly the elements or rows you need from a 2D array.

This guide demonstrates multiple techniques to filter a two-dimensional NumPy array based on conditions, with clear examples, outputs, and explanations for each approach.

Setting Up: Creating a 2D NumPy Array

Let's start by creating sample 2D arrays that we'll use throughout this guide:

import numpy as np

# A simple 2D array (2 rows, 3 columns)
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

print(arr)

Output:

[[1 2 3]
 [4 5 6]]

Method 1: Boolean Indexing (Most Common)

The most straightforward and widely used way to filter a NumPy array is boolean indexing - applying a condition directly to the array to create a boolean mask, then using that mask to select elements.

Filtering Individual Elements

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Filter all elements greater than 4
result = arr[arr > 4]
print(result)

Output:

[5 6 7 8 9]

note

When you apply a boolean condition to a 2D array and use it directly for indexing, the result is a flattened 1D array containing only the elements that satisfy the condition. The original 2D structure is not preserved.

Understanding the Boolean Mask

The condition arr > 4 creates a boolean array of the same shape:

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

mask = arr > 4
print("Boolean mask:")
print(mask)

print("\nFiltered elements:")
print(arr[mask])

Output:

Boolean mask:
[[False False False]
 [False  True  True]
 [ True  True  True]]

Filtered elements:
[5 6 7 8 9]

Filtering Entire Rows Based on a Condition

To filter complete rows where a condition applies to a specific column, use the condition on that column as the row index:

import numpy as np

arr = np.array([[1, 20, 300],
                [4, 50, 600],
                [7, 80, 900],
                [2, 10, 100]])

# Select rows where the second column (index 1) is greater than 30
filtered_rows = arr[arr[:, 1] > 30]
print(filtered_rows)

Output:

[[  4  50 600]
 [  7  80 900]]

Here, arr[:, 1] selects all values in column 1, and arr[:, 1] > 30 creates a 1D boolean mask applied to the rows.

Method 2: Combining Multiple Conditions

You can combine conditions using & (AND), | (OR), and ~ (NOT) operators. Each individual condition must be wrapped in parentheses.

import numpy as np

arr = np.array([[10, 21, 30],
                [40, 15, 60],
                [70, 82, 90],
                [25, 11, 45]])

# Rows where column 0 > 20 AND column 1 < 50
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]
print("AND condition:")
print(filtered)

# Rows where column 0 > 60 OR column 2 < 50
filtered = arr[(arr[:, 0] > 60) | (arr[:, 2] < 50)]
print("\nOR condition:")
print(filtered)

Output:

AND condition:
[[40 15 60]
 [25 11 45]]

OR condition:
[[10 21 30]
 [70 82 90]
 [25 11 45]]

Common Mistake: Using and Instead of &

Python's and keyword does not work with NumPy arrays - you must use the bitwise & operator:

# ❌ Wrong: raises ValueError
# filtered = arr[(arr[:, 0] > 20) and (arr[:, 1] < 50)]

# ✅ Correct: use & with parentheses
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]

Method 3: Using `np.where()` for Conditional Selection

np.where() returns the indices where a condition is True, or performs element-wise selection between two values:

Getting Indices of Matching Elements

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Get row and column indices where values are greater than 5
row_idx, col_idx = np.where(arr > 5)

print(f"Row indices: {row_idx}")
print(f"Col indices: {col_idx}")
print(f"Values: {arr[row_idx, col_idx]}")

Output:

Row indices: [1 2 2 2]
Col indices: [2 0 1 2]
Values: [6 7 8 9]

Replacing Values Based on a Condition

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Replace values > 5 with -1, keep others unchanged
result = np.where(arr > 5, -1, arr)
print(result)

Output:

[[ 1  2  3]
 [ 4  5 -1]
 [-1 -1 -1]]

Method 4: Using `np.all()` to Filter Columns or Rows

np.all() checks whether all elements along a specified axis satisfy a given condition. This is useful when you want to filter columns (or rows) where every value meets a specific criterion.

import numpy as np

arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)

# Keep only columns where ALL values are less than 10
col_mask = np.all(arr < 10, axis=0)
print(f"\nColumn mask (all values < 10): {col_mask}")

filtered = arr[:, col_mask]
print("\nFiltered array (columns where all values < 10):")
print(filtered)

Output:

Original array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Column mask (all values < 10): [ True  True False False]

Filtered array (columns where all values < 10):
[[0 1]
 [4 5]
 [8 9]]

Notice that column index 2 contains the value 10. Since 10 < 10 is False, that column does not satisfy the condition. Because np.all() requires every value in a column to meet the condition, the presence of a single False causes the entire column to be excluded.

You can also compute the mask directly inside the indexing expression:

filtered = arr[:, np.all(arr < 10, axis=0)]
print(filtered)

Output:

[[0 1]
 [4 5]
 [8 9]]

Method 5: Using `np.any()` to Filter Columns or Rows

np.any() tests whether at least one element along a specified axis satisfies a condition:

import numpy as np

arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)

# Keep columns where ANY value is less than 2
col_mask = np.any(arr < 2, axis=0)
print(f"\nColumn mask (any < 2): {col_mask}")

filtered = arr[:, col_mask]
print("\nFiltered columns:")
print(filtered)

Output:

Original array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Column mask (any < 2): [ True  True False False]

Filtered columns:
[[0 1]
 [4 5]
 [8 9]]

Columns 0 and 1 are kept because they contain at least one value less than 2 (values 0 and 1 respectively).

Method 6: Using `np.in1d()` for Value-Based Filtering

np.in1d() tests whether each element of an array is present in a set of target values. This is useful for filtering rows that contain specific values.

import numpy as np

arr = np.array([["1", "one"],
                ["2", "two"],
                ["3", "three"],
                ["4", "four"],
                ["5", "five"]])

# Filter rows where the second column is "two" or "four"
target_values = np.array(["two", "four"])
mask = np.in1d(arr[:, 1], target_values)

filtered = arr[mask]
print(filtered)

Output:

DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
[['2' 'two']
 ['4' 'four']]

tip

For newer NumPy versions, np.isin() is the recommended replacement for np.in1d():

mask = np.isin(arr[:, 1], target_values)
filtered = arr[mask]

Method 7: Using `np.extract()` for Element-Level Filtering

np.extract() returns elements from an array that satisfy a condition - similar to boolean indexing but as a dedicated function:

import numpy as np

arr = np.array([[10, 25, 30],
                [40, 15, 60],
                [70, 85, 90]])

# Extract all elements greater than 50
result = np.extract(arr > 50, arr)
print(result)

Output:

[60 70 85 90]

Practical Example: Filtering a Dataset

Here's a real-world-style example combining multiple filtering techniques:

import numpy as np

# Student data: [ID, Math Score, Science Score, Age]
students = np.array([
    [101, 85, 90, 20],
    [102, 72, 68, 22],
    [103, 91, 95, 21],
    [104, 60, 55, 23],
    [105, 78, 82, 20],
    [106, 95, 88, 22]
])

# Filter 1: Students with Math score > 80
high_math = students[students[:, 1] > 80]
print("High math scorers:")
print(high_math)

# Filter 2: Students with BOTH scores above 80
both_high = students[(students[:, 1] > 80) & (students[:, 2] > 80)]
print("\nHigh in both subjects:")
print(both_high)

# Filter 3: Students aged 20 OR with Math score > 90
young_or_top = students[(students[:, 3] == 20) | (students[:, 1] > 90)]
print("\nAge 20 or Math > 90:")
print(young_or_top)

Output:

High math scorers:
[[101  85  90  20]
 [103  91  95  21]
 [106  95  88  22]]

High in both subjects:
[[101  85  90  20]
 [103  91  95  21]
 [106  95  88  22]]

Age 20 or Math > 90:
[[101  85  90  20]
 [103  91  95  21]
 [105  78  82  20]
 [106  95  88  22]]

Quick Reference: Filtering Methods

Method	Use Case	Returns
`arr[condition]`	Element or row filtering	Matching elements/rows
`arr[:, col_condition]`	Column filtering	Matching columns
`np.where(condition)`	Get indices or conditional replacement	Indices or new array
`np.all(condition, axis)`	Filter where ALL values meet condition	Boolean mask
`np.any(condition, axis)`	Filter where ANY value meets condition	Boolean mask
`np.in1d()` / `np.isin()`	Match against a set of values	Boolean mask
`np.extract(condition, arr)`	Extract matching elements	1D array of matches

Conclusion

NumPy provides a rich set of tools for filtering 2D arrays based on conditions:

Boolean indexing (arr[condition]) is the simplest and most common method for both element-level and row-level filtering.
Combine conditions with &, |, and ~ operators (always with parentheses) for complex multi-condition filters.
np.where() is ideal when you need indices or want to replace values conditionally.
np.all() and np.any() let you filter entire rows or columns based on aggregate conditions across an axis.
np.isin() handles membership-based filtering when matching against a set of known values.

Choose the method that best matches your filtering needs - for most everyday tasks, boolean indexing with conditions applied to specific columns will be your go-to approach.

Setting Up: Creating a 2D NumPy Array​

Method 1: Boolean Indexing (Most Common)​

Filtering Individual Elements​

Understanding the Boolean Mask​

Filtering Entire Rows Based on a Condition​

Method 2: Combining Multiple Conditions​

Method 3: Using np.where() for Conditional Selection​

Getting Indices of Matching Elements​

Replacing Values Based on a Condition​

Method 4: Using np.all() to Filter Columns or Rows​

Method 5: Using np.any() to Filter Columns or Rows​

Method 6: Using np.in1d() for Value-Based Filtering​

Method 7: Using np.extract() for Element-Level Filtering​

Practical Example: Filtering a Dataset​

Quick Reference: Filtering Methods​

Conclusion​

Table of Contents

Setting Up: Creating a 2D NumPy Array

Method 1: Boolean Indexing (Most Common)

Filtering Individual Elements

Understanding the Boolean Mask

Filtering Entire Rows Based on a Condition

Method 2: Combining Multiple Conditions

Method 3: Using `np.where()` for Conditional Selection

Getting Indices of Matching Elements

Replacing Values Based on a Condition

Method 4: Using `np.all()` to Filter Columns or Rows

Method 5: Using `np.any()` to Filter Columns or Rows

Method 6: Using `np.in1d()` for Value-Based Filtering

Method 7: Using `np.extract()` for Element-Level Filtering

Practical Example: Filtering a Dataset

Quick Reference: Filtering Methods

Conclusion