Skip to main content

Python Pandas: How to Count NaN or Missing Values in Pandas

Quantifying missing data is an essential early step in any data analysis workflow. Before you can decide whether to drop incomplete rows, fill in default values, or investigate why data is missing, you need to know exactly how much is missing and where. A column with 2% missing values might be safe to fill with a median, while a column with 80% missing might need to be dropped entirely. Pandas provides a consistent set of methods built around .isna() that make this assessment straightforward.

In this guide, you will learn how to count missing values per column, per row, and across the entire DataFrame, calculate missing percentages, and build comprehensive data quality reports.

Counting Missing Values per Column

The most common data quality check is counting how many NaN values exist in each column. This tells you immediately which columns need attention:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, 3, np.nan],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})

print(df.isna().sum())

Output:

A    2
B 1
C 0
dtype: int64

The .isna() method returns a DataFrame of boolean values (True where data is missing), and .sum() counts the True values in each column. Column A has 2 missing values, column B has 1, and column C is complete.

Counting Missing Values per Row

To find which individual records are incomplete, sum across columns with axis=1:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6],
'C': [7, 8, 9]
})

row_nulls = df.isna().sum(axis=1)

print(row_nulls)

Output:

0    1
1 2
2 0
dtype: int64

Row 0 is missing 1 value, row 1 is missing 2 values, and row 2 is complete. This is useful for identifying records that are too incomplete to be usable.

Adding the Count as a Column

You can attach the missing count directly to the DataFrame for filtering or sorting:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6],
'C': [7, 8, 9]
})


df['Missing_Count'] = df.isna().sum(axis=1)

# Find rows with the most missing values
print(df.sort_values('Missing_Count', ascending=False))

Output;

     A    B  C  Missing_Count
1 NaN NaN 8 2
0 1.0 NaN 7 1
2 3.0 6.0 9 0

Total Missing Count Across the Entire DataFrame

For a single number representing all missing values in the DataFrame, chain two .sum() calls:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, np.nan, 6]
})

total_nulls = df.isna().sum().sum()

print(f"Total missing: {total_nulls}")

Output:

Total missing: 3

The first .sum() counts per column, and the second .sum() adds those column counts together.

Calculating Missing Percentages

Raw counts are helpful, but percentages give a better sense of severity, especially when columns have different lengths or when comparing across datasets:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})

pct_missing = (df.isna().sum() / len(df)) * 100

print(pct_missing.round(1))

Output:

A    50.0
B 25.0
C 0.0
dtype: float64

A column with 50% missing data requires a very different handling strategy than one with 25%.

Shorter Alternative with .mean()

Since .mean() on a boolean Series calculates the proportion of True values, you can get the same result more concisely:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': [1, 2, 3, 4]
})


pct_missing = df.isna().mean() * 100

print(pct_missing.round(1))

Output:

A    50.0
B 25.0
C 0.0
dtype: float64
tip

The .mean() approach is not only shorter but also slightly more efficient since it avoids the separate division step. Use df.isna().mean() * 100 as your standard pattern for calculating missing percentages.

Building a Comprehensive Missing Data Report

For a thorough data quality assessment, combine multiple metrics into a single summary DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, np.nan, 4, 5],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 3, 4, 5]
})

report = pd.DataFrame({
'Missing': df.isna().sum(),
'Present': df.notna().sum(),
'Total': len(df),
'Pct_Missing': (df.isna().mean() * 100).round(2)
})

# Sort by percentage missing, highest first
report = report.sort_values('Pct_Missing', ascending=False)

print(report)

Output:

   Missing  Present  Total  Pct_Missing
A 2 3 5 40.0
B 1 4 5 20.0
C 0 5 5 0.0

This report gives you a complete picture at a glance: how many values are missing, how many are present, and what percentage is affected for each column.

Filtering Columns by Missing Threshold

When working with wide datasets that have many columns, you may want to automatically identify columns that exceed a certain missing threshold:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'A': [1, np.nan, np.nan, np.nan, 5],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 3, 4, 5],
'D': [np.nan, np.nan, np.nan, np.nan, 5]
})

# Find columns with more than 50% missing
threshold = 0.5
high_missing = df.columns[df.isna().mean() > threshold]

print(f"Columns with >{threshold*100:.0f}% missing: {list(high_missing)}")

# Optionally drop them
df_cleaned = df.drop(columns=high_missing)
print(f"Remaining columns: {list(df_cleaned.columns)}")

Output:

Columns with >50% missing: ['A', 'D']
Remaining columns: ['B', 'C']

This is a common preprocessing step when preparing data for machine learning, where highly incomplete features add noise rather than signal.

note

isna() and isnull() are completely identical in Pandas. Modern code tends to prefer isna() because it aligns with the pd.NA sentinel value introduced in recent Pandas versions, but either works.

Quick Reference

GoalMethod
Count per columndf.isna().sum()
Count per rowdf.isna().sum(axis=1)
Total countdf.isna().sum().sum()
Percentage per columndf.isna().mean() * 100
Check if any missing existdf.isna().any().any()
Columns above thresholddf.columns[df.isna().mean() > 0.5]
  • Use df.isna().sum() as a standard first step in data cleaning to identify which columns need attention.
  • Calculate percentages with df.isna().mean() * 100 to understand the severity of missing data, and build summary reports for comprehensive data quality assessment before deciding on your handling strategy.