Skip to main content

Python NumPy: How to Filter a Two-Dimensional NumPy Array Based on a Condition

Filtering data in multidimensional arrays is a fundamental operation in data analysis, scientific computing, and machine learning. NumPy makes this efficient and expressive through boolean indexing, conditional expressions, and specialized functions that let you extract exactly the elements or rows you need from a 2D array.

This guide demonstrates multiple techniques to filter a two-dimensional NumPy array based on conditions, with clear examples, outputs, and explanations for each approach.

Setting Up: Creating a 2D NumPy Array

Let's start by creating sample 2D arrays that we'll use throughout this guide:

import numpy as np

# A simple 2D array (2 rows, 3 columns)
arr = np.array([[1, 2, 3],
[4, 5, 6]])

print(arr)

Output:

[[1 2 3]
[4 5 6]]

Method 1: Boolean Indexing (Most Common)

The most straightforward and widely used way to filter a NumPy array is boolean indexing - applying a condition directly to the array to create a boolean mask, then using that mask to select elements.

Filtering Individual Elements

import numpy as np

arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# Filter all elements greater than 4
result = arr[arr > 4]
print(result)

Output:

[5 6 7 8 9]
note

When you apply a boolean condition to a 2D array and use it directly for indexing, the result is a flattened 1D array containing only the elements that satisfy the condition. The original 2D structure is not preserved.

Understanding the Boolean Mask

The condition arr > 4 creates a boolean array of the same shape:

import numpy as np

arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

mask = arr > 4
print("Boolean mask:")
print(mask)

print("\nFiltered elements:")
print(arr[mask])

Output:

Boolean mask:
[[False False False]
[False True True]
[ True True True]]

Filtered elements:
[5 6 7 8 9]

Filtering Entire Rows Based on a Condition

To filter complete rows where a condition applies to a specific column, use the condition on that column as the row index:

import numpy as np

arr = np.array([[1, 20, 300],
[4, 50, 600],
[7, 80, 900],
[2, 10, 100]])

# Select rows where the second column (index 1) is greater than 30
filtered_rows = arr[arr[:, 1] > 30]
print(filtered_rows)

Output:

[[  4  50 600]
[ 7 80 900]]

Here, arr[:, 1] selects all values in column 1, and arr[:, 1] > 30 creates a 1D boolean mask applied to the rows.

Method 2: Combining Multiple Conditions

You can combine conditions using & (AND), | (OR), and ~ (NOT) operators. Each individual condition must be wrapped in parentheses.

import numpy as np

arr = np.array([[10, 21, 30],
[40, 15, 60],
[70, 82, 90],
[25, 11, 45]])

# Rows where column 0 > 20 AND column 1 < 50
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]
print("AND condition:")
print(filtered)

# Rows where column 0 > 60 OR column 2 < 50
filtered = arr[(arr[:, 0] > 60) | (arr[:, 2] < 50)]
print("\nOR condition:")
print(filtered)

Output:

AND condition:
[[40 15 60]
[25 11 45]]

OR condition:
[[10 21 30]
[70 82 90]
[25 11 45]]
Common Mistake: Using and Instead of &

Python's and keyword does not work with NumPy arrays - you must use the bitwise & operator:

# ❌ Wrong: raises ValueError
# filtered = arr[(arr[:, 0] > 20) and (arr[:, 1] < 50)]

# ✅ Correct: use & with parentheses
filtered = arr[(arr[:, 0] > 20) & (arr[:, 1] < 50)]

Method 3: Using np.where() for Conditional Selection

np.where() returns the indices where a condition is True, or performs element-wise selection between two values:

Getting Indices of Matching Elements

import numpy as np

arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# Get row and column indices where values are greater than 5
row_idx, col_idx = np.where(arr > 5)

print(f"Row indices: {row_idx}")
print(f"Col indices: {col_idx}")
print(f"Values: {arr[row_idx, col_idx]}")

Output:

Row indices: [1 2 2 2]
Col indices: [2 0 1 2]
Values: [6 7 8 9]

Replacing Values Based on a Condition

import numpy as np

arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# Replace values > 5 with -1, keep others unchanged
result = np.where(arr > 5, -1, arr)
print(result)

Output:

[[ 1  2  3]
[ 4 5 -1]
[-1 -1 -1]]

Method 4: Using np.all() to Filter Columns or Rows

np.all() checks whether all elements along a specified axis satisfy a given condition. This is useful when you want to filter columns (or rows) where every value meets a specific criterion.

import numpy as np

arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)

# Keep only columns where ALL values are less than 10
col_mask = np.all(arr < 10, axis=0)
print(f"\nColumn mask (all values < 10): {col_mask}")

filtered = arr[:, col_mask]
print("\nFiltered array (columns where all values < 10):")
print(filtered)

Output:

Original array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

Column mask (all values < 10): [ True True False False]

Filtered array (columns where all values < 10):
[[0 1]
[4 5]
[8 9]]

Notice that column index 2 contains the value 10. Since 10 < 10 is False, that column does not satisfy the condition. Because np.all() requires every value in a column to meet the condition, the presence of a single False causes the entire column to be excluded.

You can also compute the mask directly inside the indexing expression:

filtered = arr[:, np.all(arr < 10, axis=0)]
print(filtered)

Output:

[[0 1]
[4 5]
[8 9]]

Method 5: Using np.any() to Filter Columns or Rows

np.any() tests whether at least one element along a specified axis satisfies a condition:

import numpy as np

arr = np.arange(12).reshape((3, 4))
print("Original array:")
print(arr)

# Keep columns where ANY value is less than 2
col_mask = np.any(arr < 2, axis=0)
print(f"\nColumn mask (any < 2): {col_mask}")

filtered = arr[:, col_mask]
print("\nFiltered columns:")
print(filtered)

Output:

Original array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

Column mask (any < 2): [ True True False False]

Filtered columns:
[[0 1]
[4 5]
[8 9]]

Columns 0 and 1 are kept because they contain at least one value less than 2 (values 0 and 1 respectively).

Method 6: Using np.in1d() for Value-Based Filtering

np.in1d() tests whether each element of an array is present in a set of target values. This is useful for filtering rows that contain specific values.

import numpy as np

arr = np.array([["1", "one"],
["2", "two"],
["3", "three"],
["4", "four"],
["5", "five"]])

# Filter rows where the second column is "two" or "four"
target_values = np.array(["two", "four"])
mask = np.in1d(arr[:, 1], target_values)

filtered = arr[mask]
print(filtered)

Output:

DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
[['2' 'two']
['4' 'four']]
tip

For newer NumPy versions, np.isin() is the recommended replacement for np.in1d():

mask = np.isin(arr[:, 1], target_values)
filtered = arr[mask]

Method 7: Using np.extract() for Element-Level Filtering

np.extract() returns elements from an array that satisfy a condition - similar to boolean indexing but as a dedicated function:

import numpy as np

arr = np.array([[10, 25, 30],
[40, 15, 60],
[70, 85, 90]])

# Extract all elements greater than 50
result = np.extract(arr > 50, arr)
print(result)

Output:

[60 70 85 90]

Practical Example: Filtering a Dataset

Here's a real-world-style example combining multiple filtering techniques:

import numpy as np

# Student data: [ID, Math Score, Science Score, Age]
students = np.array([
[101, 85, 90, 20],
[102, 72, 68, 22],
[103, 91, 95, 21],
[104, 60, 55, 23],
[105, 78, 82, 20],
[106, 95, 88, 22]
])

# Filter 1: Students with Math score > 80
high_math = students[students[:, 1] > 80]
print("High math scorers:")
print(high_math)

# Filter 2: Students with BOTH scores above 80
both_high = students[(students[:, 1] > 80) & (students[:, 2] > 80)]
print("\nHigh in both subjects:")
print(both_high)

# Filter 3: Students aged 20 OR with Math score > 90
young_or_top = students[(students[:, 3] == 20) | (students[:, 1] > 90)]
print("\nAge 20 or Math > 90:")
print(young_or_top)

Output:

High math scorers:
[[101 85 90 20]
[103 91 95 21]
[106 95 88 22]]

High in both subjects:
[[101 85 90 20]
[103 91 95 21]
[106 95 88 22]]

Age 20 or Math > 90:
[[101 85 90 20]
[103 91 95 21]
[105 78 82 20]
[106 95 88 22]]

Quick Reference: Filtering Methods

MethodUse CaseReturns
arr[condition]Element or row filteringMatching elements/rows
arr[:, col_condition]Column filteringMatching columns
np.where(condition)Get indices or conditional replacementIndices or new array
np.all(condition, axis)Filter where ALL values meet conditionBoolean mask
np.any(condition, axis)Filter where ANY value meets conditionBoolean mask
np.in1d() / np.isin()Match against a set of valuesBoolean mask
np.extract(condition, arr)Extract matching elements1D array of matches

Conclusion

NumPy provides a rich set of tools for filtering 2D arrays based on conditions:

  • Boolean indexing (arr[condition]) is the simplest and most common method for both element-level and row-level filtering.
  • Combine conditions with &, |, and ~ operators (always with parentheses) for complex multi-condition filters.
  • np.where() is ideal when you need indices or want to replace values conditionally.
  • np.all() and np.any() let you filter entire rows or columns based on aggregate conditions across an axis.
  • np.isin() handles membership-based filtering when matching against a set of known values.

Choose the method that best matches your filtering needs - for most everyday tasks, boolean indexing with conditions applied to specific columns will be your go-to approach.