Skip to main content

How to Count the Number of Pages in a PDF File in Python

Counting the number of pages in a PDF file is a common task when building document management systems, validating uploads, automating report generation, or processing large batches of PDF documents. Python provides several libraries to work with PDFs, with PyPDF2 being one of the most popular and straightforward options.

In this guide, you'll learn how to count PDF pages using PyPDF2 and other libraries, handle common errors, and process multiple PDF files efficiently.

Installing PyPDF2

Install the library using pip:

pip install PyPDF2

Counting Pages with len(reader.pages)

The recommended way to count pages in modern versions of PyPDF2 is to use the pages property with len():

import PyPDF2

with open('document.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
total_pages = len(reader.pages)

print(f"Total pages: {total_pages}")

Output: (suppose that document.pdf has 10 pages)

Total pages: 10

Step-by-step breakdown:

  1. Open the file in read binary mode ('rb'). PDF files are binary, not text.
  2. Create a PdfReader object to parse the PDF structure.
  3. Access reader.pages, which returns a list-like object of all pages.
  4. Use len() to count the total number of pages.
Using with Statement

Always open PDF files with the with statement to ensure the file is properly closed after processing, even if an error occurs:

# ✅ Recommended: file is automatically closed
with open('document.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)

# ❌ Avoid: must remember to close manually
file = open('document.pdf', 'rb')
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)
file.close()

Handling Errors Gracefully

PDF files can be corrupted, password-protected, or missing. Always wrap your code in error handling:

import PyPDF2

def count_pdf_pages(filepath):
"""Count the number of pages in a PDF file."""
try:
with open(filepath, 'rb') as file:
reader = PyPDF2.PdfReader(file)

if reader.is_encrypted:
try:
reader.decrypt('') # Try empty password
except Exception:
return None, "PDF is encrypted and requires a password"

return len(reader.pages), None

except FileNotFoundError:
return None, f"File not found: {filepath}"
except PyPDF2.errors.PdfReadError:
return None, "File is not a valid PDF or is corrupted"
except Exception as e:
return None, f"Unexpected error: {e}"


# Usage
pages, error = count_pdf_pages('document.pdf')

if error:
print(f"Error: {error}")
else:
print(f"Total pages: {pages}")

Output (success):

Total pages: 10

Output (file not found):

Error: File not found: document.pdf

Counting Pages in Password-Protected PDFs

If the PDF is encrypted, you need to decrypt it first:

import PyPDF2

with open('protected.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)

if reader.is_encrypted:
# Provide the password
reader.decrypt('your_password')

total_pages = len(reader.pages)
print(f"Total pages: {total_pages}")

Output:

Total pages: 15

Processing Multiple PDF Files

To count pages across multiple PDF files in a directory:

import PyPDF2
import os

def count_pages_in_directory(directory):
"""Count pages in all PDF files within a directory."""
results = []

for filename in sorted(os.listdir(directory)):
if filename.lower().endswith('.pdf'):
filepath = os.path.join(directory, filename)
try:
with open(filepath, 'rb') as file:
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)
results.append((filename, pages))
except Exception as e:
results.append((filename, f"Error: {e}"))

return results


# Usage
pdf_dir = '/path/to/pdf/folder'
results = count_pages_in_directory(pdf_dir)

total = 0
for filename, pages in results:
if isinstance(pages, int):
print(f" {filename:30s} {pages:>5} pages")
total += pages
else:
print(f" {filename:30s} {pages}")

print(f"\n {'TOTAL':30s} {total:>5} pages")

Output:

  annual_report.pdf                 42 pages
contract.pdf 12 pages
presentation.pdf 25 pages
summary.pdf 3 pages

TOTAL 82 pages

Using Alternative Libraries

Using pikepdf

pikepdf is a modern, actively maintained PDF library:

pip install pikepdf
import pikepdf

with pikepdf.open('document.pdf') as pdf:
total_pages = len(pdf.pages)
print(f"Total pages: {total_pages}")

Output:

Total pages: 10

Using pdfplumber

pdfplumber is great when you also need to extract text or tables:

pip install pdfplumber
import pdfplumber

with pdfplumber.open('document.pdf') as pdf:
total_pages = len(pdf.pages)
print(f"Total pages: {total_pages}")

Output:

Total pages: 10

Using fitz (PyMuPDF)

PyMuPDF is one of the fastest PDF libraries available:

pip install PyMuPDF
import fitz  # PyMuPDF

doc = fitz.open('document.pdf')
total_pages = doc.page_count
print(f"Total pages: {total_pages}")
doc.close()

Output:

Total pages: 10

Deprecated Method: getNumPages()

Deprecated API

In older versions of PyPDF2, getNumPages() was used to count pages. This method is deprecated since version 1.28.0:

# ❌ Deprecated: avoid in new code
total = reader.getNumPages()

# ✅ Use this instead
total = len(reader.pages)

If you see getNumPages() in existing code, replace it with len(reader.pages) to avoid deprecation warnings.

Quick Comparison of Libraries

LibraryInstallationSpeedHandles EncryptedExtra Features
PyPDF2pip install PyPDF2GoodMerge, split, extract text
pikepdfpip install pikepdfFastLow-level PDF manipulation
pdfplumberpip install pdfplumberModerateText/table extraction
PyMuPDF (fitz)pip install PyMuPDF⚡ FastestRendering, annotations

Conclusion

Counting pages in a PDF file with Python is straightforward using PyPDF2 or alternative libraries:

  • Use len(reader.pages) with PyPDF2 for a simple, reliable page count. This is the modern, recommended approach.
  • Handle errors with try-except to gracefully manage missing, corrupted, or encrypted files.
  • Use reader.decrypt() for password-protected PDFs before counting pages.
  • For batch processing, iterate through a directory and collect results for all PDF files.
  • Consider PyMuPDF (fitz) for the fastest performance, or pdfplumber if you also need text extraction.

For most use cases, PyPDF2 with len(reader.pages) is the simplest and most effective solution.