How to Delete Pages from a PDF in Python
Removing pages from a PDF is a common task when cleaning up reports, extracting specific sections, or redacting confidential content. The PyMuPDF library (imported as fitz) provides fast and efficient PDF manipulation that handles complex documents reliably, even with large files.
In this guide, you will learn how to delete single pages, multiple pages, ranges, blank pages, and pages matching specific content criteria. Each method is explained with clear examples so you can choose the right approach for your situation.
Installation
pip install pymupdf
Deleting a Single Page
PDF pages in PyMuPDF are 0-indexed, so the first page of the document is index 0, the second is index 1, and so on:
import fitz
doc = fitz.open("document.pdf")
print(f"Pages before: {len(doc)}")
# Delete the first page (index 0)
doc.delete_page(0)
# Delete the last page using negative indexing
doc.delete_page(-1)
print(f"Pages after: {len(doc)}")
doc.save("document_modified.pdf")
doc.close()
Example output:
Pages before: 10
Pages after: 8
Deleting Multiple Specific Pages
To remove several non-consecutive pages, you must delete them in reverse order. Deleting a page shifts all subsequent page indices down by one, so working from the highest index first prevents index errors:
import fitz
doc = fitz.open("report.pdf")
# Remove pages 2, 5, and 8 in human numbering (indices 1, 4, 7)
pages_to_delete = [1, 4, 7]
for page_num in sorted(pages_to_delete, reverse=True):
doc.delete_page(page_num)
doc.save("report_cleaned.pdf")
doc.close()
A Common Mistake: Deleting in Forward Order
Deleting pages from lowest index to highest causes incorrect results because each deletion shifts the remaining indices:
import fitz
doc = fitz.open("report.pdf")
# Wrong: deleting in forward order
pages_to_delete = [1, 4, 7]
for page_num in pages_to_delete:
doc.delete_page(page_num) # After first deletion, page 4 is now page 3!
After deleting index 1, what was page 5 (index 4) is now at index 3. The loop then deletes the wrong page. Always sort in reverse order:
for page_num in sorted(pages_to_delete, reverse=True):
doc.delete_page(page_num)
When deleting multiple pages, always process in reverse order (highest index first). Deleting a page shifts all subsequent page indices down by one, which causes the loop to target the wrong pages if processed in forward order.
Deleting a Range of Pages
To remove a continuous block of pages, use delete_pages() with start and end indices:
import fitz
doc = fitz.open("book.pdf")
print(f"Pages before: {len(doc)}")
# Delete pages 5 through 10 in human numbering (indices 4 through 9)
doc.delete_pages(from_page=4, to_page=9)
print(f"Pages after: {len(doc)}")
doc.save("book_shortened.pdf")
doc.close()
Example output:
Pages before: 50
Pages after: 44
Keeping Only Specific Pages
When you want to extract a small number of pages from a large document, it is easier to specify which pages to keep rather than which to delete. The select() method does exactly this:
import fitz
doc = fitz.open("presentation.pdf")
# Keep only the first page, third page, and last page (0-indexed)
pages_to_keep = [0, 2, len(doc) - 1]
doc.select(pages_to_keep)
doc.save("highlights.pdf")
doc.close()
This is also useful for extracting every Nth page:
import fitz
doc = fitz.open("document.pdf")
# Keep every other page (odd pages in human numbering)
odd_pages = list(range(0, len(doc), 2))
doc.select(odd_pages)
doc.save("odd_pages_only.pdf")
doc.close()
Use doc.select() when you want to keep a small subset of a large document. Use delete_page() when you are removing just a few pages from a document you are mostly keeping.
Removing Blank Pages
Scanned documents often contain blank separator pages. You can detect and remove them by checking for text and image content:
import fitz
def is_blank_page(page, text_threshold=100):
"""Check if a page is essentially blank."""
text = page.get_text().strip()
if len(text) > text_threshold:
return False
images = page.get_images()
if images:
return False
return True
doc = fitz.open("scanned_document.pdf")
blank_pages = []
for i, page in enumerate(doc):
if is_blank_page(page):
blank_pages.append(i)
print(f"Found {len(blank_pages)} blank page(s): {blank_pages}")
# Delete blank pages in reverse order
for page_num in reversed(blank_pages):
doc.delete_page(page_num)
doc.save("document_no_blanks.pdf")
doc.close()
Example output:
Found 3 blank page(s): [2, 5, 8]
Conditional Page Deletion Based on Content
You can remove pages that contain specific text, such as draft watermarks or confidential markers:
import fitz
doc = fitz.open("report.pdf")
pages_to_delete = []
for i, page in enumerate(doc):
text = page.get_text().lower()
if "confidential" in text or "draft" in text:
pages_to_delete.append(i)
print(f"Marking page {i + 1} for deletion")
for page_num in reversed(pages_to_delete):
doc.delete_page(page_num)
print(f"\nRemoved {len(pages_to_delete)} page(s)")
doc.save("report_final.pdf")
doc.close()
Example output:
Marking page 3 for deletion
Marking page 7 for deletion
Removed 2 page(s)
Using PyPDF2 as an Alternative
If you prefer PyPDF2, the approach is slightly different. Instead of deleting pages from an existing document, you create a new document and add only the pages you want to keep:
pip install pypdf2
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader("document.pdf")
writer = PdfWriter()
# Pages to skip (0-indexed)
pages_to_delete = {0, 4, 7}
for i, page in enumerate(reader.pages):
if i not in pages_to_delete:
writer.add_page(page)
with open("document_modified.pdf", "wb") as output_file:
writer.write(output_file)
print(f"Original: {len(reader.pages)} pages")
print(f"Modified: {len(writer.pages)} pages")
Example output:
Original: 10 pages
Modified: 7 pages
Complete Reusable Utility Function
A production-ready function that handles various deletion scenarios:
import fitz
def modify_pdf_pages(
input_path: str,
output_path: str,
delete_pages: list[int] | None = None,
keep_pages: list[int] | None = None,
delete_range: tuple[int, int] | None = None
) -> int:
"""
Modify a PDF by deleting or keeping specific pages.
Args:
input_path: Source PDF file path.
output_path: Destination PDF file path.
delete_pages: List of page indices to delete (0-indexed).
keep_pages: List of page indices to keep (0-indexed).
delete_range: Tuple of (start, end) page indices to delete.
Returns:
Number of pages in the output document.
"""
doc = fitz.open(input_path)
original_count = len(doc)
if keep_pages is not None:
resolved = [p if p >= 0 else len(doc) + p for p in keep_pages]
doc.select(resolved)
elif delete_range is not None:
start, end = delete_range
doc.delete_pages(from_page=start, to_page=end)
elif delete_pages is not None:
for page_num in sorted(delete_pages, reverse=True):
if page_num < 0:
page_num = len(doc) + page_num
if 0 <= page_num < len(doc):
doc.delete_page(page_num)
doc.save(output_path)
final_count = len(doc)
doc.close()
print(f"Pages: {original_count} -> {final_count}")
return final_count
# Usage examples
modify_pdf_pages("report.pdf", "report_v2.pdf", delete_pages=[0, 5, 10])
modify_pdf_pages("book.pdf", "excerpt.pdf", keep_pages=[0, 1, -1])
modify_pdf_pages("document.pdf", "trimmed.pdf", delete_range=(10, 20))
Example output:
Pages: 25 -> 22
Pages: 100 -> 3
Pages: 50 -> 39
Method Comparison
| Method | Use Case | PyMuPDF Syntax |
|---|---|---|
| Single page | Remove one specific page | doc.delete_page(n) |
| Multiple pages | Remove several non-consecutive pages | Loop with delete_page() in reverse |
| Page range | Remove a continuous block | doc.delete_pages(start, end) |
| Keep specific pages | Extract a subset from a large document | doc.select([indices]) |
| Content-based | Remove pages matching criteria | Loop, check content, delete in reverse |
Conclusion
PyMuPDF provides a fast and reliable toolkit for deleting pages from PDF files.
- Use
delete_page()for removing individual pages,delete_pages()for continuous ranges, anddoc.select()when it is easier to specify which pages to keep. - Always delete multiple pages in reverse index order to prevent index shifting bugs.
- For content-based removal, combine
get_text()with conditional logic to identify and remove pages matching specific criteria. - If you prefer not to use PyMuPDF, PyPDF2 offers an alternative approach by building a new document from the pages you want to keep.