Python Pandas: How to Count Non-Missing (non-NaN) Values in Your DataFrame
A fundamental step in data exploration and cleaning with Pandas is understanding the completeness of your dataset. Knowing how many actual data points (non-missing or non-NaN values) exist in each column or row, or even in the entire DataFrame, is crucial for assessing data quality and planning subsequent analytical steps.
This guide will provide a comprehensive overview of how to effectively count non-NaN values in Pandas DataFrames. You'll learn to use the DataFrame.count() method for per-column and per-row counts, how to get a total count for the entire DataFrame, how to include empty strings in your definition of "missing," and an alternative approach using DataFrame.notna().sum().
Understanding Non-NaN Values in Pandas
In Pandas, "missing" or "Not a Number" (NaN) values represent data that is absent. When we talk about counting "non-NaN" values, we are essentially counting how many actual, valid data points are present. This count excludes any cells that Pandas recognizes as missing.
Let's create a sample DataFrame with some missing values (None or np.nan):
import pandas as pd
import numpy as np # For np.nan
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
product_name category price stock_qty
0 Laptop Electronics 1200.0 10.0
1 Mouse Accessories 25.0 NaN
2 None Electronics NaN 5.0
3 Keyboard None 75.0 0.0
4 None Office NaN 15.0
5 Monitor Electronics 300.0 NaN
Method 1: Using DataFrame.count()
The DataFrame.count() method is the most direct way to count non-NA (Not Available/Not a Number) cells.
Counting Non-NaN Values per Column (Default)
By default, df.count() operates along axis=0, meaning it counts non-NA values for each column.
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# Count non-NaN values in each column
column_counts = df.count()
print("Number of non-NaN values per column:")
print(column_counts)
Output:
Number of non-NaN values per column:
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
product_namehas 4 non-missing values (2Nonevalues were excluded).categoryhas 4 non-missing values (2Nonevalues were excluded).pricehas 4 non-missing values (2np.nanvalues were excluded).stock_qtyhas 4 non-missing values (2np.nanvalues were excluded).
Counting Non-NaN Values per Row (axis=1)
To count non-NA values for each row, set the axis parameter to 1 or 'columns'.
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# Count non-NaN values in each row
row_counts = df.count(axis=1) # or axis='columns'
print("Number of non-NaN values per row:")
print(row_counts)
Output:
Number of non-NaN values per row:
0 4
1 3
2 2
3 3
4 2
5 3
dtype: int64
For example, the first row (index 0) has 4 non-missing values, while the second row (index 1) has 3 (because stock_qty is NaN).
What Pandas Considers an "NA" Value
The count() method (and other NA-related functions like isnull(), notna()) considers the following as NA values by default:
None(Python's null object)numpy.nan(IEEE 754 floating point representation of Not a Number)pandas.NaT(Not a Time, for datetime-like missing values)- Optionally,
numpy.infandnumpy.NINFcan be treated as NA ifpd.options.mode.use_inf_as_nais set toTrue(it'sFalseby default).
Method 2: Getting the Total Count of Non-NaN Values in the DataFrame
If you need the grand total of all non-missing cells in the entire DataFrame, you can sum the result of df.count().
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
column_counts = df.count()
# Total number of non-NaN values in the entire DataFrame
total_non_nan_values = column_counts.sum()
# Alternatively, and more directly if you have NumPy imported:
# total_non_nan_values = np.sum(df.count()) # This also works
print(f"Total number of non-NaN values in the DataFrame: {total_non_nan_values}")
Output:
Total number of non-NaN values in the DataFrame: 17
This sums the non-NaN counts from each column (4+4+4+4 = 16).
Method 3: Counting Empty Strings as Missing Values
By default, df.count() considers empty strings ('') as valid, non-missing values. If you want to treat empty strings as if they were NaN for counting purposes, you first need to replace them with np.nan.
import pandas as pd
import numpy as np
df_with_empty_strings = pd.DataFrame({
'col_A': ['Apple', '', 'Banana', None],
'col_B': [10, 20, 30, '']
})
print("DataFrame with empty strings:")
print(df_with_empty_strings)
print()
print("Counts (empty strings are NOT NaN by default):")
print(df_with_empty_strings.count())
print()
# ✅ Replace empty strings with np.nan, then count
counts_empty_as_nan = df_with_empty_strings.replace('', np.nan).count()
print("Counts (treating empty strings as NaN):")
print(counts_empty_as_nan)
Output:
DataFrame with empty strings:
col_A col_B
0 Apple 10
1 20
2 Banana 30
3 None
Counts (empty strings are NOT NaN by default):
col_A 3
col_B 4
dtype: int64
Counts (treating empty strings as NaN):
col_A 2
col_B 3
dtype: int64
The df.replace('', np.nan) step temporarily converts empty strings to NaN before the count is performed.
Method 4: Using DataFrame.notna().sum() (and notnull().sum())
An alternative way to count non-missing values is to first create a boolean DataFrame indicating non-missing values using DataFrame.notna() and then sum these boolean values (where True counts as 1 and False as 0).
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# df.notna() returns a boolean DataFrame (True where not NA, False where NA)
print("Output of df.notna():")
print(df.notna())
print()
# Summing the True values per column gives the count of non-NA values
column_counts_notna = df.notna().sum()
print("Number of non-NaN values per column (using .notna().sum()):")
print(column_counts_notna)
Output:
Output of df.notna():
product_name category price stock_qty
0 True True True True
1 True True True False
2 False True False True
3 True False True True
4 False True False True
5 True True True False
Number of non-NaN values per column (using .notna().sum()):
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
- This produces the same result as
df.count(). - The method
DataFrame.notnull()is an alias forDataFrame.notna(), sodf.notnull().sum()will also yield the same result.
Conclusion
Counting non-NaN values is essential for understanding data completeness in Pandas.
df.count()is the primary method for getting per-column (default) or per-row (axis=1) counts of non-missing values.- To get a total count of non-missing values for the entire DataFrame, sum the results of
df.count(). - If empty strings (
'') should be treated as missing, replace them withnp.nanbefore counting:df.replace('', np.nan).count(). df.notna().sum()(ordf.notnull().sum()) offers an alternative, often equally clear, way to achieve the same per-column counts.
These methods provide you with the flexibility to accurately assess the presence of valid data points in your DataFrames, guiding your data cleaning and analysis efforts.