How to Represent Missing Values with NaN and None
Python has two distinct ways to represent missing or absent values: NaN (Not a Number) for numerical contexts and None for general-purpose absence. Understanding their differences is essential for data handling and avoiding subtle bugs.
Core Differences
| Feature | NaN | None |
|---|---|---|
| Type | float | NoneType |
| Purpose | Missing numeric data | Absence of any value |
| Equality | nan != nan (always False) | None == None (True) |
| Math operations | Propagates (nan + 1 = nan) | Raises TypeError |
| Check method | math.isnan(x) | x is None |
| Boolean value | True (truthy) | False (falsy) |
Working with None
None is Python's built-in singleton representing the absence of a value. It's used for optional parameters, uninitialized variables, and functions without explicit returns.
# None is falsy and has its own type
x = None
print(type(x)) # <class 'NoneType'>
print(bool(x)) # False
print(x is None) # True
print(x == None) # True (but 'is' preferred)
Checking for None
Always use the identity operator is rather than equality ==:
value = None
# Correct approach
if value is None:
print("No value provided")
# Also correct for checking existence
if value is not None:
print(f"Value exists: {value}")
Use is None instead of == None because is checks identity (faster and more explicit), while == can be overridden by custom classes to return unexpected results.
None in Functions
def find_user(user_id):
users = {1: "Alice", 2: "Bob"}
return users.get(user_id) # Returns None if not found
result = find_user(999)
if result is None:
print("User not found")
Working with NaN
NaN represents undefined or unrepresentable numerical results. It's part of the IEEE 754 floating-point standard and behaves uniquely in comparisons.
import math
# Creating NaN
nan1 = float('nan')
nan2 = math.nan
print(type(nan1)) # <class 'float'>
print(type(nan2)) # <class 'float'>
The Unusual Equality Behavior
NaN is the only value in Python that is not equal to itself:
import math
val = math.nan
# NaN is never equal to anything, including itself
print(val == val) # False
print(val == float('nan')) # False
print(val != val) # True
Output:
False
False
True
Checking for NaN
Because nan != nan, you must use dedicated functions:
import math
val = float('nan')
# Wrong approach: always False
if val == float('nan'):
print("This never executes")
# Correct approach
if math.isnan(val):
print("Value is NaN")
Output:
Value is NaN
Never use == float('nan') or == math.nan to check for NaN. This comparison always returns False, even when comparing NaN to itself.
NaN Propagation in Math
NaN propagates through calculations, contaminating results:
import math
nan = math.nan
print(nan + 100) # nan
print(nan * 0) # nan
print(nan / nan) # nan
print(max(1, nan)) # 1
None vs NaN in Practice
Type Behavior
import math
# None breaks numeric operations
try:
result = None + 1
except TypeError as e:
print(f"None error: {e}") # unsupported operand type(s)
# NaN propagates silently
result = math.nan + 1
print(f"NaN result: {result}") # nan
Boolean Context
import math
# None is falsy
if not None:
print("None is falsy") # Prints
# NaN is truthy!
if math.nan:
print("NaN is truthy") # Prints
Output:
None is falsy
NaN is truthy
This truthy behavior of NaN can cause bugs. A condition like if value: will pass for NaN but fail for None, even though both represent missing data.
Handling Both in Pandas
Pandas treats None and NaN similarly in DataFrames, converting None to NaN in numeric columns:
import pandas as pd
import numpy as np
# Mixed missing values
df = pd.DataFrame({
'numbers': [1.0, None, float('nan'), 4.0],
'strings': ['a', None, 'c', None]
})
print(df)
# isna() detects both None and NaN
print(df.isna())
Output:
numbers strings
0 1.0 a
1 NaN None
2 NaN c
3 4.0 None
numbers strings
0 False False
1 True True
2 True False
3 False True
Checking and Filling Missing Values
import pandas as pd
import numpy as np
df = pd.DataFrame({
'value': [1, None, np.nan, 4],
'category': ['A', None, 'B', 'C']
})
# Count missing values
print(df.isna().sum())
# value 2
# category 1
# Fill missing values
df_filled = df.fillna({'value': 0, 'category': 'Unknown'})
print(df_filled)
Output:
value 2
category 1
dtype: int64
value category
0 1.0 A
1 0.0 Unknown
2 0.0 B
3 4.0 C
Distinguishing None from NaN in Pandas
When you need to differentiate between them:
import pandas as pd
import numpy as np
# Use object dtype to preserve None
data = pd.Series([1, None, np.nan], dtype=object)
def check_missing(val):
if val is None:
return 'None'
try:
if np.isnan(val):
return 'NaN'
except (TypeError, ValueError):
pass
return 'Value'
print(data.apply(check_missing))
Output:
0 Value
1 None
2 NaN
dtype: object
Comprehensive Missing Value Check
When you need to handle both types robustly:
import math
def is_missing(value):
"""Check if value is None or NaN."""
if value is None:
return True
try:
return math.isnan(value)
except (TypeError, ValueError):
return False
# Test cases
test_values = [None, float('nan'), 0, '', [], 42, 'hello']
for val in test_values:
print(f"{str(val):10} -> missing: {is_missing(val)}")
Output:
None -> missing: True
nan -> missing: True
0 -> missing: False
-> missing: False
[] -> missing: False
42 -> missing: False
hello -> missing: False
When to Use Each
| Scenario | Recommended Type | Reason |
|---|---|---|
| Function returns no result | None | Pythonic convention |
| Optional parameter default | None | Standard practice |
| Missing sensor reading | NaN | Preserves float dtype |
| Database NULL in numeric column | NaN | Allows vectorized operations |
| Database NULL in object column | None | Natural representation |
| Uninitialized variable | None | Explicit absence |
| Mathematical undefined result | NaN | IEEE 754 standard |
Summary
- Use
Nonefor general absence of values: function returns, optional parameters, and non-numeric missing data. - Use
NaNfor missing numerical data where you need to maintain float dtype and perform vectorized operations. - In Pandas, both are treated as missing values and detected by
isna(), but understanding their distinct behaviors prevents subtle bugs in your data processing pipelines.