Python Pandas: How to Count NaN or Missing Values in Pandas
Quantifying missing data is an essential early step in any data analysis workflow. Before you can decide whether to drop incomplete rows, fill in default values, or investigate why data is missing, you need to know exactly how much is missing and where. A column with 2% missing values might be safe to fill with a median, while a column with 80% missing might need to be dropped entirely. Pandas provides a consistent set of methods built around .isna() that make this assessment straightforward.
In this guide, you will learn how to count missing values per column, per row, and across the entire DataFrame, calculate missing percentages, and build comprehensive data quality reports.
Counting Missing Values per Column
The most common data quality check is counting how many NaN values exist in each column. This tells you immediately which columns need attention:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, 3, np.nan],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})
print(df.isna().sum())
Output:
A 2
B 1
C 0
dtype: int64
The .isna() method returns a DataFrame of boolean values (True where data is missing), and .sum() counts the True values in each column. Column A has 2 missing values, column B has 1, and column C is complete.
Counting Missing Values per Row
To find which individual records are incomplete, sum across columns with axis=1:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6],
'C': [7, 8, 9]
})
row_nulls = df.isna().sum(axis=1)
print(row_nulls)
Output:
0 1
1 2
2 0
dtype: int64
Row 0 is missing 1 value, row 1 is missing 2 values, and row 2 is complete. This is useful for identifying records that are too incomplete to be usable.
Adding the Count as a Column
You can attach the missing count directly to the DataFrame for filtering or sorting:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6],
'C': [7, 8, 9]
})
df['Missing_Count'] = df.isna().sum(axis=1)
# Find rows with the most missing values
print(df.sort_values('Missing_Count', ascending=False))
Output;
A B C Missing_Count
1 NaN NaN 8 2
0 1.0 NaN 7 1
2 3.0 6.0 9 0
Total Missing Count Across the Entire DataFrame
For a single number representing all missing values in the DataFrame, chain two .sum() calls:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6]
})
total_nulls = df.isna().sum().sum()
print(f"Total missing: {total_nulls}")
Output:
Total missing: 3
The first .sum() counts per column, and the second .sum() adds those column counts together.
Calculating Missing Percentages
Raw counts are helpful, but percentages give a better sense of severity, especially when columns have different lengths or when comparing across datasets:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})
pct_missing = (df.isna().sum() / len(df)) * 100
print(pct_missing.round(1))
Output:
A 50.0
B 25.0
C 0.0
dtype: float64
A column with 50% missing data requires a very different handling strategy than one with 25%.
Shorter Alternative with .mean()
Since .mean() on a boolean Series calculates the proportion of True values, you can get the same result more concisely:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})
pct_missing = df.isna().mean() * 100
print(pct_missing.round(1))
Output:
A 50.0
B 25.0
C 0.0
dtype: float64
The .mean() approach is not only shorter but also slightly more efficient since it avoids the separate division step. Use df.isna().mean() * 100 as your standard pattern for calculating missing percentages.
Building a Comprehensive Missing Data Report
For a thorough data quality assessment, combine multiple metrics into a single summary DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4, 5],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 3, 4, 5]
})
report = pd.DataFrame({
'Missing': df.isna().sum(),
'Present': df.notna().sum(),
'Total': len(df),
'Pct_Missing': (df.isna().mean() * 100).round(2)
})
# Sort by percentage missing, highest first
report = report.sort_values('Pct_Missing', ascending=False)
print(report)
Output:
Missing Present Total Pct_Missing
A 2 3 5 40.0
B 1 4 5 20.0
C 0 5 5 0.0
This report gives you a complete picture at a glance: how many values are missing, how many are present, and what percentage is affected for each column.
Filtering Columns by Missing Threshold
When working with wide datasets that have many columns, you may want to automatically identify columns that exceed a certain missing threshold:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, np.nan, np.nan, 5],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 3, 4, 5],
'D': [np.nan, np.nan, np.nan, np.nan, 5]
})
# Find columns with more than 50% missing
threshold = 0.5
high_missing = df.columns[df.isna().mean() > threshold]
print(f"Columns with >{threshold*100:.0f}% missing: {list(high_missing)}")
# Optionally drop them
df_cleaned = df.drop(columns=high_missing)
print(f"Remaining columns: {list(df_cleaned.columns)}")
Output:
Columns with >50% missing: ['A', 'D']
Remaining columns: ['B', 'C']
This is a common preprocessing step when preparing data for machine learning, where highly incomplete features add noise rather than signal.
isna() and isnull() are completely identical in Pandas. Modern code tends to prefer isna() because it aligns with the pd.NA sentinel value introduced in recent Pandas versions, but either works.
Quick Reference
| Goal | Method |
|---|---|
| Count per column | df.isna().sum() |
| Count per row | df.isna().sum(axis=1) |
| Total count | df.isna().sum().sum() |
| Percentage per column | df.isna().mean() * 100 |
| Check if any missing exist | df.isna().any().any() |
| Columns above threshold | df.columns[df.isna().mean() > 0.5] |
- Use
df.isna().sum()as a standard first step in data cleaning to identify which columns need attention. - Calculate percentages with
df.isna().mean() * 100to understand the severity of missing data, and build summary reports for comprehensive data quality assessment before deciding on your handling strategy.