Skip to main content

Python NumPy: How to Remove NaN Values from a NumPy Array

NaN (Not a Number) values are a common issue in numerical computing and data analysis. They appear when data is missing, corrupted, or results from undefined mathematical operations (like 0/0). Since NaN values can silently corrupt calculations - producing incorrect sums, means, and comparisons - removing them is often the first step in data cleaning.

In this guide, you'll learn multiple methods to remove NaN values from NumPy arrays, understand the differences between each approach, and choose the right one for your use case.

Understanding NaN in NumPy

NaN is a special floating-point value defined by the IEEE 754 standard. In NumPy, it's represented as np.nan:

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
print("Array:", arr)
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))

Output:

Array: [ 1. nan  3. nan  5.]
Sum: nan
Mean: nan
caution

NaN values propagate through calculations. A single NaN in an array makes the entire result of sum(), mean(), max(), and other aggregate functions return nan. This is why removing or handling NaN values is critical before performing any analysis.

The most common and Pythonic approach combines np.isnan() with the bitwise NOT operator (~) to create a boolean mask that selects only non-NaN elements:

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])

clean = arr[~np.isnan(arr)]
print("Original:", arr)
print("Cleaned: ", clean)

Output:

Original: [ 1. nan  3. nan  5.]
Cleaned: [1. 3. 5.]

How It Works

  1. np.isnan(arr) creates a boolean array: [False, True, False, True, False] - True where values are NaN.
  2. ~np.isnan(arr) inverts it: [True, False, True, False, True] - True where values are valid.
  3. arr[~np.isnan(arr)] uses boolean indexing to select only the valid elements.

With 2D Arrays

When applied to a 2D array, boolean indexing flattens the result into a 1D array:

import numpy as np

arr = np.array([[12, 5, np.nan, 7],
[2, 61, 1, np.nan],
[np.nan, 1, np.nan, 5]])

clean = arr[~np.isnan(arr)]
print("Cleaned (flattened):", clean)

Output:

Cleaned (flattened): [12.  5.  7.  2. 61.  1.  1.  5.]
info

Boolean indexing on multi-dimensional arrays always returns a 1D array because the selected elements don't necessarily form a regular grid. If you need to preserve the 2D structure, see the section on removing rows or columns containing NaN.

Method 2: Using np.isfinite()

The np.isfinite() function returns True for all finite numbers, filtering out both NaN and infinity values:

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.inf, 5.0, -np.inf])

clean = arr[np.isfinite(arr)]
print("Original:", arr)
print("Cleaned: ", clean)

Output:

Original: [  1.  nan   3.  inf   5. -inf]
Cleaned: [1. 3. 5.]
When to use isfinite() vs isnan()
  • Use ~np.isnan() when you only want to remove NaN values but keep infinite values.
  • Use np.isfinite() when you want to remove both NaN and infinity values.

In most data analysis scenarios, np.isfinite() is the safer choice because infinite values are usually just as problematic as NaN.

Method 3: Using np.logical_not() with np.isnan()

This approach is functionally identical to ~np.isnan() but uses an explicit function call instead of the bitwise NOT operator:

import numpy as np

arr = np.array([6.0, 2.0, np.nan, 8.0, np.nan, 1.0])

clean = arr[np.logical_not(np.isnan(arr))]
print("Cleaned:", clean)

Output:

Cleaned: [6. 2. 8. 1.]

The ~ operator and np.logical_not() produce the same result. Use whichever you find more readable.

Removing Rows Containing NaN (Preserving 2D Structure)

If you need to keep the 2D shape of your array, you can remove entire rows that contain any NaN values:

import numpy as np

arr = np.array([[1.0, 2.0, 3.0],
[4.0, np.nan, 6.0],
[7.0, 8.0, 9.0],
[np.nan, 11.0, 12.0]])

# Remove rows where ANY value is NaN
clean = arr[~np.isnan(arr).any(axis=1)]
print("Original shape:", arr.shape)
print("Cleaned shape: ", clean.shape)
print("Cleaned array:")
print(clean)

Output:

Original shape: (4, 3)
Cleaned shape: (2, 3)
Cleaned array:
[[1. 2. 3.]
[7. 8. 9.]]

Removing Columns Containing NaN

Similarly, you can remove columns that contain any NaN values:

import numpy as np

arr = np.array([[1.0, np.nan, 3.0],
[4.0, np.nan, 6.0],
[7.0, np.nan, 9.0]])

# Remove columns where ANY value is NaN
clean = arr[:, ~np.isnan(arr).any(axis=0)]
print("Cleaned array:")
print(clean)

Output:

Cleaned array:
[[1. 3.]
[4. 6.]
[7. 9.]]

Replacing NaN Instead of Removing

Sometimes you don't want to remove NaN values (which changes the array size) but instead replace them with a specific value:

Replace with Zero

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])

clean = np.nan_to_num(arr, nan=0.0)
print("Replaced with 0:", clean)

Output:

Replaced with 0: [1. 0. 3. 0. 5.]

Replace with the Mean

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])

mean_val = np.nanmean(arr) # Computes mean ignoring NaN
arr[np.isnan(arr)] = mean_val
print(f"Replaced with mean ({mean_val}):", arr)

Output:

Replaced with mean (3.0): [1. 3. 3. 3. 5.]
tip

NumPy provides NaN-aware versions of common functions: np.nansum(), np.nanmean(), np.nanmax(), np.nanmin(), np.nanstd(). These compute results while ignoring NaN values, which can be simpler than removing NaN values first.

import numpy as np

arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
print("nanmean:", np.nanmean(arr)) # nanmean: 3.0
print("nansum:", np.nansum(arr)) # nansum: 9.0

Common Mistake: Using == to Check for NaN

A frequent error is trying to compare values with np.nan using the equality operator:

import numpy as np

arr = np.array([1.0, np.nan, 3.0])

# ❌ Wrong: NaN is NOT equal to itself
mask = arr != np.nan
print("Mask:", mask)
print("Result:", arr[mask])

Output:

Mask: [ True  True  True]
Result: [ 1. nan 3.]

The NaN value is not removed because, by IEEE 754 definition, NaN != NaN is True. This means np.nan == np.nan returns False, making equality comparisons useless for detecting NaN.

Always use np.isnan():

import numpy as np

arr = np.array([1.0, np.nan, 3.0])

# ✅ Correct: use np.isnan()
mask = ~np.isnan(arr)
print("Result:", arr[mask])

Output:

Result: [1. 3.]

Comparison of Methods

MethodRemoves NaNRemoves infPreserves ShapeUse Case
arr[~np.isnan(arr)]❌ (flattens)Most common, NaN only
arr[np.isfinite(arr)]❌ (flattens)Remove all non-finite values
arr[~np.isnan(arr).any(axis=1)]✅ (removes rows)Clean rows in 2D arrays
np.nan_to_num(arr)ReplacesReplacesKeep shape, fill with defaults
np.nanmean(), np.nansum()IgnoresCompute stats without cleaning

Summary

To remove NaN values from a NumPy array:

  • Use arr[~np.isnan(arr)] for the simplest, most common approach - it filters out all NaN values from any array.
  • Use arr[np.isfinite(arr)] when you also need to remove infinity values.
  • Use arr[~np.isnan(arr).any(axis=1)] to remove entire rows containing NaN while preserving the 2D structure.
  • Use np.nan_to_num() or assignment with np.isnan() when you want to replace NaN values instead of removing them.
  • Never use == or != to check for NaN - always use np.isnan().