How to Count the Number of Pages in a PDF File in Python
Counting the number of pages in a PDF file is a common task when building document management systems, validating uploads, automating report generation, or processing large batches of PDF documents. Python provides several libraries to work with PDFs, with PyPDF2 being one of the most popular and straightforward options.
In this guide, you'll learn how to count PDF pages using PyPDF2 and other libraries, handle common errors, and process multiple PDF files efficiently.
Installing PyPDF2
Install the library using pip:
pip install PyPDF2
Counting Pages with len(reader.pages)
The recommended way to count pages in modern versions of PyPDF2 is to use the pages property with len():
import PyPDF2
with open('document.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
total_pages = len(reader.pages)
print(f"Total pages: {total_pages}")
Output: (suppose that document.pdf has 10 pages)
Total pages: 10
Step-by-step breakdown:
- Open the file in read binary mode (
'rb'). PDF files are binary, not text. - Create a
PdfReaderobject to parse the PDF structure. - Access
reader.pages, which returns a list-like object of all pages. - Use
len()to count the total number of pages.
with StatementAlways open PDF files with the with statement to ensure the file is properly closed after processing, even if an error occurs:
# ✅ Recommended: file is automatically closed
with open('document.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)
# ❌ Avoid: must remember to close manually
file = open('document.pdf', 'rb')
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)
file.close()
Handling Errors Gracefully
PDF files can be corrupted, password-protected, or missing. Always wrap your code in error handling:
import PyPDF2
def count_pdf_pages(filepath):
"""Count the number of pages in a PDF file."""
try:
with open(filepath, 'rb') as file:
reader = PyPDF2.PdfReader(file)
if reader.is_encrypted:
try:
reader.decrypt('') # Try empty password
except Exception:
return None, "PDF is encrypted and requires a password"
return len(reader.pages), None
except FileNotFoundError:
return None, f"File not found: {filepath}"
except PyPDF2.errors.PdfReadError:
return None, "File is not a valid PDF or is corrupted"
except Exception as e:
return None, f"Unexpected error: {e}"
# Usage
pages, error = count_pdf_pages('document.pdf')
if error:
print(f"Error: {error}")
else:
print(f"Total pages: {pages}")
Output (success):
Total pages: 10
Output (file not found):
Error: File not found: document.pdf
Counting Pages in Password-Protected PDFs
If the PDF is encrypted, you need to decrypt it first:
import PyPDF2
with open('protected.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
if reader.is_encrypted:
# Provide the password
reader.decrypt('your_password')
total_pages = len(reader.pages)
print(f"Total pages: {total_pages}")
Output:
Total pages: 15
Processing Multiple PDF Files
To count pages across multiple PDF files in a directory:
import PyPDF2
import os
def count_pages_in_directory(directory):
"""Count pages in all PDF files within a directory."""
results = []
for filename in sorted(os.listdir(directory)):
if filename.lower().endswith('.pdf'):
filepath = os.path.join(directory, filename)
try:
with open(filepath, 'rb') as file:
reader = PyPDF2.PdfReader(file)
pages = len(reader.pages)
results.append((filename, pages))
except Exception as e:
results.append((filename, f"Error: {e}"))
return results
# Usage
pdf_dir = '/path/to/pdf/folder'
results = count_pages_in_directory(pdf_dir)
total = 0
for filename, pages in results:
if isinstance(pages, int):
print(f" {filename:30s} {pages:>5} pages")
total += pages
else:
print(f" {filename:30s} {pages}")
print(f"\n {'TOTAL':30s} {total:>5} pages")
Output:
annual_report.pdf 42 pages
contract.pdf 12 pages
presentation.pdf 25 pages
summary.pdf 3 pages
TOTAL 82 pages
Using Alternative Libraries
Using pikepdf
pikepdf is a modern, actively maintained PDF library:
pip install pikepdf
import pikepdf
with pikepdf.open('document.pdf') as pdf:
total_pages = len(pdf.pages)
print(f"Total pages: {total_pages}")
Output:
Total pages: 10
Using pdfplumber
pdfplumber is great when you also need to extract text or tables:
pip install pdfplumber
import pdfplumber
with pdfplumber.open('document.pdf') as pdf:
total_pages = len(pdf.pages)
print(f"Total pages: {total_pages}")
Output:
Total pages: 10
Using fitz (PyMuPDF)
PyMuPDF is one of the fastest PDF libraries available:
pip install PyMuPDF
import fitz # PyMuPDF
doc = fitz.open('document.pdf')
total_pages = doc.page_count
print(f"Total pages: {total_pages}")
doc.close()
Output:
Total pages: 10
Deprecated Method: getNumPages()
In older versions of PyPDF2, getNumPages() was used to count pages. This method is deprecated since version 1.28.0:
# ❌ Deprecated: avoid in new code
total = reader.getNumPages()
# ✅ Use this instead
total = len(reader.pages)
If you see getNumPages() in existing code, replace it with len(reader.pages) to avoid deprecation warnings.
Quick Comparison of Libraries
| Library | Installation | Speed | Handles Encrypted | Extra Features |
|---|---|---|---|---|
| PyPDF2 | pip install PyPDF2 | Good | ✅ | Merge, split, extract text |
| pikepdf | pip install pikepdf | Fast | ✅ | Low-level PDF manipulation |
| pdfplumber | pip install pdfplumber | Moderate | ❌ | Text/table extraction |
| PyMuPDF (fitz) | pip install PyMuPDF | ⚡ Fastest | ✅ | Rendering, annotations |
Conclusion
Counting pages in a PDF file with Python is straightforward using PyPDF2 or alternative libraries:
- Use
len(reader.pages)with PyPDF2 for a simple, reliable page count. This is the modern, recommended approach. - Handle errors with try-except to gracefully manage missing, corrupted, or encrypted files.
- Use
reader.decrypt()for password-protected PDFs before counting pages. - For batch processing, iterate through a directory and collect results for all PDF files.
- Consider PyMuPDF (
fitz) for the fastest performance, orpdfplumberif you also need text extraction.
For most use cases, PyPDF2 with len(reader.pages) is the simplest and most effective solution.