Skip to main content

How to Represent Missing Values with NaN and None

Python has two distinct ways to represent missing or absent values: NaN (Not a Number) for numerical contexts and None for general-purpose absence. Understanding their differences is essential for data handling and avoiding subtle bugs.

Core Differences

FeatureNaNNone
TypefloatNoneType
PurposeMissing numeric dataAbsence of any value
Equalitynan != nan (always False)None == None (True)
Math operationsPropagates (nan + 1 = nan)Raises TypeError
Check methodmath.isnan(x)x is None
Boolean valueTrue (truthy)False (falsy)

Working with None

None is Python's built-in singleton representing the absence of a value. It's used for optional parameters, uninitialized variables, and functions without explicit returns.

# None is falsy and has its own type
x = None

print(type(x)) # <class 'NoneType'>
print(bool(x)) # False
print(x is None) # True
print(x == None) # True (but 'is' preferred)

Checking for None

Always use the identity operator is rather than equality ==:

value = None

# Correct approach
if value is None:
print("No value provided")

# Also correct for checking existence
if value is not None:
print(f"Value exists: {value}")
tip

Use is None instead of == None because is checks identity (faster and more explicit), while == can be overridden by custom classes to return unexpected results.

None in Functions

def find_user(user_id):
users = {1: "Alice", 2: "Bob"}
return users.get(user_id) # Returns None if not found

result = find_user(999)

if result is None:
print("User not found")

Working with NaN

NaN represents undefined or unrepresentable numerical results. It's part of the IEEE 754 floating-point standard and behaves uniquely in comparisons.

import math

# Creating NaN
nan1 = float('nan')
nan2 = math.nan

print(type(nan1)) # <class 'float'>
print(type(nan2)) # <class 'float'>

The Unusual Equality Behavior

NaN is the only value in Python that is not equal to itself:

import math

val = math.nan

# NaN is never equal to anything, including itself
print(val == val) # False
print(val == float('nan')) # False
print(val != val) # True

Output:

False
False
True

Checking for NaN

Because nan != nan, you must use dedicated functions:

import math

val = float('nan')

# Wrong approach: always False
if val == float('nan'):
print("This never executes")

# Correct approach
if math.isnan(val):
print("Value is NaN")

Output:

Value is NaN
warning

Never use == float('nan') or == math.nan to check for NaN. This comparison always returns False, even when comparing NaN to itself.

NaN Propagation in Math

NaN propagates through calculations, contaminating results:

import math

nan = math.nan

print(nan + 100) # nan
print(nan * 0) # nan
print(nan / nan) # nan
print(max(1, nan)) # 1

None vs NaN in Practice

Type Behavior

import math

# None breaks numeric operations
try:
result = None + 1
except TypeError as e:
print(f"None error: {e}") # unsupported operand type(s)

# NaN propagates silently
result = math.nan + 1
print(f"NaN result: {result}") # nan

Boolean Context

import math

# None is falsy
if not None:
print("None is falsy") # Prints

# NaN is truthy!
if math.nan:
print("NaN is truthy") # Prints

Output:

None is falsy
NaN is truthy
note

This truthy behavior of NaN can cause bugs. A condition like if value: will pass for NaN but fail for None, even though both represent missing data.

Handling Both in Pandas

Pandas treats None and NaN similarly in DataFrames, converting None to NaN in numeric columns:

import pandas as pd
import numpy as np

# Mixed missing values
df = pd.DataFrame({
'numbers': [1.0, None, float('nan'), 4.0],
'strings': ['a', None, 'c', None]
})

print(df)

# isna() detects both None and NaN
print(df.isna())

Output:

   numbers strings
0 1.0 a
1 NaN None
2 NaN c
3 4.0 None
numbers strings
0 False False
1 True True
2 True False
3 False True

Checking and Filling Missing Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
'value': [1, None, np.nan, 4],
'category': ['A', None, 'B', 'C']
})

# Count missing values
print(df.isna().sum())
# value 2
# category 1

# Fill missing values
df_filled = df.fillna({'value': 0, 'category': 'Unknown'})
print(df_filled)

Output:

value       2
category 1
dtype: int64
value category
0 1.0 A
1 0.0 Unknown
2 0.0 B
3 4.0 C

Distinguishing None from NaN in Pandas

When you need to differentiate between them:

import pandas as pd
import numpy as np

# Use object dtype to preserve None
data = pd.Series([1, None, np.nan], dtype=object)

def check_missing(val):
if val is None:
return 'None'
try:
if np.isnan(val):
return 'NaN'
except (TypeError, ValueError):
pass
return 'Value'

print(data.apply(check_missing))

Output:

0    Value
1 None
2 NaN
dtype: object

Comprehensive Missing Value Check

When you need to handle both types robustly:

import math

def is_missing(value):
"""Check if value is None or NaN."""
if value is None:
return True
try:
return math.isnan(value)
except (TypeError, ValueError):
return False

# Test cases
test_values = [None, float('nan'), 0, '', [], 42, 'hello']

for val in test_values:
print(f"{str(val):10} -> missing: {is_missing(val)}")

Output:

None       -> missing: True
nan -> missing: True
0 -> missing: False
-> missing: False
[] -> missing: False
42 -> missing: False
hello -> missing: False

When to Use Each

ScenarioRecommended TypeReason
Function returns no resultNonePythonic convention
Optional parameter defaultNoneStandard practice
Missing sensor readingNaNPreserves float dtype
Database NULL in numeric columnNaNAllows vectorized operations
Database NULL in object columnNoneNatural representation
Uninitialized variableNoneExplicit absence
Mathematical undefined resultNaNIEEE 754 standard

Summary

  • Use None for general absence of values: function returns, optional parameters, and non-numeric missing data.
  • Use NaN for missing numerical data where you need to maintain float dtype and perform vectorized operations.
  • In Pandas, both are treated as missing values and detected by isna(), but understanding their distinct behaviors prevents subtle bugs in your data processing pipelines.