PyPDF2 Tutorial: Read, Merge, Split, and Encrypt PDFs Using Python

26 Jan 2026

PyPDF2 is a popular Python library used to work with PDF files. With PyPDF2, you can read PDFs, extract text, merge multiple PDFs, split pages, rotate pages, and even add password protection.

If your project involves PDF automation, reports, invoices, or document processing, PyPDF2 is a must-learn library.


What Is PyPDF2?

PyPDF2 is a pure-Python PDF library that allows you to:

  • Read PDF files
  • Extract text
  • Merge and split PDFs
  • Rotate pages
  • Encrypt and decrypt PDFs

📌 It works on Windows, Linux, and macOS.


Install PyPDF2

Install using pip:

pip install PyPDF2

Import the library:

from PyPDF2 import PdfReader, PdfWriter

Read a PDF File Using PyPDF2

from PyPDF2 import PdfReader

reader = PdfReader("sample.pdf")
print(len(reader.pages))

📌 Use case:

Counting pages in reports or documents.

Extract Text from a PDF

page = reader.pages[0]
text = page.extract_text()
print(text)

📌 Real-world use:

  • Resume parsing
  • Invoice data extraction
  • Report analysis

⚠️ Text extraction depends on how the PDF was created.


Read All Pages from a PDF

for page in reader.pages:
    print(page.extract_text())

Merge Multiple PDF Files

from PyPDF2 import PdfWriter

writer = PdfWriter()

for pdf in ["file1.pdf", "file2.pdf"]:
    reader = PdfReader(pdf)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as f:
    writer.write(f)

📌 Use case:

Combining reports, bills, or scanned documents.

Split a PDF into Multiple Files

reader = PdfReader("sample.pdf")

for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)

    with open(f"page_{i+1}.pdf", "wb") as f:
        writer.write(f)

📌 Use case:

Splitting invoices or certificates page-wise.

Rotate PDF Pages

writer = PdfWriter()
page = reader.pages[0]
page.rotate(90)
writer.add_page(page)

with open("rotated.pdf", "wb") as f:
    writer.write(f)

Encrypt a PDF with Password

writer = PdfWriter()
writer.append_pages_from_reader(reader)
writer.encrypt("mypassword")

with open("protected.pdf", "wb") as f:
    writer.write(f)

📌 Use case:

Securing confidential documents.

Decrypt a Password-Protected PDF

reader = PdfReader("protected.pdf")
reader.decrypt("mypassword")

Real-World PyPDF2 Examples

Merge All PDFs in a Folder

from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter

writer = PdfWriter()

for pdf in Path(".").glob("*.pdf"):
    reader = PdfReader(pdf)
    writer.append_pages_from_reader(reader)

with open("final.pdf", "wb") as f:
    writer.write(f)

Extract Text from All PDFs Automatically

for pdf in Path(".").glob("*.pdf"):
    reader = PdfReader(pdf)
    for page in reader.pages:
        print(page.extract_text())


Merge All PDFs from a Folder 

Folder structure example

pdfs/

 ├── file1.pdf

 ├── file2.pdf

 ├── report.pdf


from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter

pdf_folder = Path("pdfs")  # folder containing PDFs
output_file = "merged.pdf"

writer = PdfWriter()

for pdf_path in sorted(pdf_folder.glob("*.pdf")):
  reader = PdfReader(pdf_path)
  for page in reader.pages:
    writer.add_page(page)

with open(output_file, "wb") as f:
  writer.write(f)

print("PDFs merged successfully!")

🔍 What This Code Does (Simple Explanation)

  • Path("pdfs") → points to the folder
  • glob("*.pdf") → finds all PDF files
  • sorted() → merges in alphabetical order
  • PdfReader → reads each PDF
  • PdfWriter → collects all pages
  • Writes everything into merged.pdf


Common Limitations of PyPDF2

❌ Not good for scanned PDFs (images)

❌ Layout formatting may be lost

❌ Slower for very large PDFs

📌 For scanned PDFs, use OCR tools like pytesseract.


Frequently Asked Questions (FAQs)

What is PyPDF2 used for?

PyPDF2 is used for reading, writing, merging, splitting, and encrypting PDF files in Python.


Can PyPDF2 extract text from scanned PDFs?

No. Scanned PDFs require OCR tools.


Is PyPDF2 free?

Yes, it is open-source and free to use.




Final Thoughts

PyPDF2 is a powerful and beginner-friendly library for PDF automation in Python. If your project involves document handling, reports, or PDF processing, PyPDF2 can save you hours of manual work.

📄 + 🐍 = Automation Magic