Python NumPy: How to Remove NaN Values from a NumPy Array
NaN (Not a Number) values are a common issue in numerical computing and data analysis. They appear when data is missing, corrupted, or results from undefined mathematical operations (like 0/0). Since NaN values can silently corrupt calculations - producing incorrect sums, means, and comparisons - removing them is often the first step in data cleaning.
In this guide, you'll learn multiple methods to remove NaN values from NumPy arrays, understand the differences between each approach, and choose the right one for your use case.
Understanding NaN in NumPy
NaN is a special floating-point value defined by the IEEE 754 standard. In NumPy, it's represented as np.nan:
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
print("Array:", arr)
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
Output:
Array: [ 1. nan 3. nan 5.]
Sum: nan
Mean: nan
NaN values propagate through calculations. A single NaN in an array makes the entire result of sum(), mean(), max(), and other aggregate functions return nan. This is why removing or handling NaN values is critical before performing any analysis.
Method 1: Using ~np.isnan() (Recommended)
The most common and Pythonic approach combines np.isnan() with the bitwise NOT operator (~) to create a boolean mask that selects only non-NaN elements:
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
clean = arr[~np.isnan(arr)]
print("Original:", arr)
print("Cleaned: ", clean)
Output:
Original: [ 1. nan 3. nan 5.]
Cleaned: [1. 3. 5.]
How It Works
np.isnan(arr)creates a boolean array:[False, True, False, True, False]-Truewhere values are NaN.~np.isnan(arr)inverts it:[True, False, True, False, True]-Truewhere values are valid.arr[~np.isnan(arr)]uses boolean indexing to select only the valid elements.
With 2D Arrays
When applied to a 2D array, boolean indexing flattens the result into a 1D array:
import numpy as np
arr = np.array([[12, 5, np.nan, 7],
[2, 61, 1, np.nan],
[np.nan, 1, np.nan, 5]])
clean = arr[~np.isnan(arr)]
print("Cleaned (flattened):", clean)
Output:
Cleaned (flattened): [12. 5. 7. 2. 61. 1. 1. 5.]
Boolean indexing on multi-dimensional arrays always returns a 1D array because the selected elements don't necessarily form a regular grid. If you need to preserve the 2D structure, see the section on removing rows or columns containing NaN.
Method 2: Using np.isfinite()
The np.isfinite() function returns True for all finite numbers, filtering out both NaN and infinity values:
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.inf, 5.0, -np.inf])
clean = arr[np.isfinite(arr)]
print("Original:", arr)
print("Cleaned: ", clean)
Output:
Original: [ 1. nan 3. inf 5. -inf]
Cleaned: [1. 3. 5.]
isfinite() vs isnan()- Use
~np.isnan()when you only want to remove NaN values but keep infinite values. - Use
np.isfinite()when you want to remove both NaN and infinity values.
In most data analysis scenarios, np.isfinite() is the safer choice because infinite values are usually just as problematic as NaN.
Method 3: Using np.logical_not() with np.isnan()
This approach is functionally identical to ~np.isnan() but uses an explicit function call instead of the bitwise NOT operator:
import numpy as np
arr = np.array([6.0, 2.0, np.nan, 8.0, np.nan, 1.0])
clean = arr[np.logical_not(np.isnan(arr))]
print("Cleaned:", clean)
Output:
Cleaned: [6. 2. 8. 1.]
The ~ operator and np.logical_not() produce the same result. Use whichever you find more readable.
Removing Rows Containing NaN (Preserving 2D Structure)
If you need to keep the 2D shape of your array, you can remove entire rows that contain any NaN values:
import numpy as np
arr = np.array([[1.0, 2.0, 3.0],
[4.0, np.nan, 6.0],
[7.0, 8.0, 9.0],
[np.nan, 11.0, 12.0]])
# Remove rows where ANY value is NaN
clean = arr[~np.isnan(arr).any(axis=1)]
print("Original shape:", arr.shape)
print("Cleaned shape: ", clean.shape)
print("Cleaned array:")
print(clean)
Output:
Original shape: (4, 3)
Cleaned shape: (2, 3)
Cleaned array:
[[1. 2. 3.]
[7. 8. 9.]]
Removing Columns Containing NaN
Similarly, you can remove columns that contain any NaN values:
import numpy as np
arr = np.array([[1.0, np.nan, 3.0],
[4.0, np.nan, 6.0],
[7.0, np.nan, 9.0]])
# Remove columns where ANY value is NaN
clean = arr[:, ~np.isnan(arr).any(axis=0)]
print("Cleaned array:")
print(clean)
Output:
Cleaned array:
[[1. 3.]
[4. 6.]
[7. 9.]]
Replacing NaN Instead of Removing
Sometimes you don't want to remove NaN values (which changes the array size) but instead replace them with a specific value:
Replace with Zero
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
clean = np.nan_to_num(arr, nan=0.0)
print("Replaced with 0:", clean)
Output:
Replaced with 0: [1. 0. 3. 0. 5.]
Replace with the Mean
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
mean_val = np.nanmean(arr) # Computes mean ignoring NaN
arr[np.isnan(arr)] = mean_val
print(f"Replaced with mean ({mean_val}):", arr)
Output:
Replaced with mean (3.0): [1. 3. 3. 3. 5.]
NumPy provides NaN-aware versions of common functions: np.nansum(), np.nanmean(), np.nanmax(), np.nanmin(), np.nanstd(). These compute results while ignoring NaN values, which can be simpler than removing NaN values first.
import numpy as np
arr = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
print("nanmean:", np.nanmean(arr)) # nanmean: 3.0
print("nansum:", np.nansum(arr)) # nansum: 9.0
Common Mistake: Using == to Check for NaN
A frequent error is trying to compare values with np.nan using the equality operator:
import numpy as np
arr = np.array([1.0, np.nan, 3.0])
# ❌ Wrong: NaN is NOT equal to itself
mask = arr != np.nan
print("Mask:", mask)
print("Result:", arr[mask])
Output:
Mask: [ True True True]
Result: [ 1. nan 3.]
The NaN value is not removed because, by IEEE 754 definition, NaN != NaN is True. This means np.nan == np.nan returns False, making equality comparisons useless for detecting NaN.
Always use np.isnan():
import numpy as np
arr = np.array([1.0, np.nan, 3.0])
# ✅ Correct: use np.isnan()
mask = ~np.isnan(arr)
print("Result:", arr[mask])
Output:
Result: [1. 3.]
Comparison of Methods
| Method | Removes NaN | Removes inf | Preserves Shape | Use Case |
|---|---|---|---|---|
arr[~np.isnan(arr)] | ✅ | ❌ | ❌ (flattens) | Most common, NaN only |
arr[np.isfinite(arr)] | ✅ | ✅ | ❌ (flattens) | Remove all non-finite values |
arr[~np.isnan(arr).any(axis=1)] | ✅ | ❌ | ✅ (removes rows) | Clean rows in 2D arrays |
np.nan_to_num(arr) | Replaces | Replaces | ✅ | Keep shape, fill with defaults |
np.nanmean(), np.nansum() | Ignores | ❌ | ✅ | Compute stats without cleaning |
Summary
To remove NaN values from a NumPy array:
- Use
arr[~np.isnan(arr)]for the simplest, most common approach - it filters out all NaN values from any array. - Use
arr[np.isfinite(arr)]when you also need to remove infinity values. - Use
arr[~np.isnan(arr).any(axis=1)]to remove entire rows containing NaN while preserving the 2D structure. - Use
np.nan_to_num()or assignment withnp.isnan()when you want to replace NaN values instead of removing them. - Never use
==or!=to check for NaN - always usenp.isnan().