PyPDF2 Python Tutorial (Beginner to Advanced)

PyPDF2 is a popular Python library used to work with PDF files. With PyPDF2, you can read PDFs, extract text, merge multiple PDFs, split pages, rotate pages, and even add password protection.

If your project involves PDF automation, reports, invoices, or document processing, PyPDF2 is a must-learn library.

What Is PyPDF2?

PyPDF2 is a pure-Python PDF library that allows you to:

Read PDF files
Extract text
Merge and split PDFs
Rotate pages
Encrypt and decrypt PDFs

📌 It works on Windows, Linux, and macOS.

Install PyPDF2

Install using pip:

pip install PyPDF2

Import the library:

from PyPDF2 import PdfReader, PdfWriter

Read a PDF File Using PyPDF2

from PyPDF2 import PdfReader

reader = PdfReader("sample.pdf")
print(len(reader.pages))

📌 Use case:

Counting pages in reports or documents.

Extract Text from a PDF

page = reader.pages[0]
text = page.extract_text()
print(text)

📌 Real-world use:

Resume parsing
Invoice data extraction
Report analysis

⚠️ Text extraction depends on how the PDF was created.

Read All Pages from a PDF

for page in reader.pages:
    print(page.extract_text())

Merge Multiple PDF Files

from PyPDF2 import PdfWriter

writer = PdfWriter()

for pdf in ["file1.pdf", "file2.pdf"]:
    reader = PdfReader(pdf)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as f:
    writer.write(f)

📌 Use case:

Combining reports, bills, or scanned documents.

Split a PDF into Multiple Files

reader = PdfReader("sample.pdf")

for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)

    with open(f"page_{i+1}.pdf", "wb") as f:
        writer.write(f)

📌 Use case:

Splitting invoices or certificates page-wise.

Rotate PDF Pages

writer = PdfWriter()
page = reader.pages[0]
page.rotate(90)
writer.add_page(page)

with open("rotated.pdf", "wb") as f:
    writer.write(f)

Encrypt a PDF with Password

writer = PdfWriter()
writer.append_pages_from_reader(reader)
writer.encrypt("mypassword")

with open("protected.pdf", "wb") as f:
    writer.write(f)

📌 Use case:

Securing confidential documents.

Decrypt a Password-Protected PDF

reader = PdfReader("protected.pdf")
reader.decrypt("mypassword")

Real-World PyPDF2 Examples

Merge All PDFs in a Folder

from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter

writer = PdfWriter()

for pdf in Path(".").glob("*.pdf"):
    reader = PdfReader(pdf)
    writer.append_pages_from_reader(reader)

with open("final.pdf", "wb") as f:
    writer.write(f)

Extract Text from All PDFs Automatically

for pdf in Path(".").glob("*.pdf"):
    reader = PdfReader(pdf)
    for page in reader.pages:
        print(page.extract_text())

Merge All PDFs from a Folder

Folder structure example

pdfs/

├── file1.pdf

├── file2.pdf

├── report.pdf

from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter

pdf_folder = Path("pdfs")  # folder containing PDFs
output_file = "merged.pdf"

writer = PdfWriter()

for pdf_path in sorted(pdf_folder.glob("*.pdf")):
  reader = PdfReader(pdf_path)
  for page in reader.pages:
    writer.add_page(page)

with open(output_file, "wb") as f:
  writer.write(f)

print("PDFs merged successfully!")

🔍 What This Code Does (Simple Explanation)

Path("pdfs") → points to the folder
glob("*.pdf") → finds all PDF files
sorted() → merges in alphabetical order
PdfReader → reads each PDF
PdfWriter → collects all pages
Writes everything into merged.pdf

Common Limitations of PyPDF2

❌ Not good for scanned PDFs (images)

❌ Layout formatting may be lost

❌ Slower for very large PDFs

📌 For scanned PDFs, use OCR tools like pytesseract.

Frequently Asked Questions (FAQs)

What is PyPDF2 used for?

PyPDF2 is used for reading, writing, merging, splitting, and encrypting PDF files in Python.

Can PyPDF2 extract text from scanned PDFs?

No. Scanned PDFs require OCR tools.

Is PyPDF2 free?

Yes, it is open-source and free to use.

Final Thoughts

PyPDF2 is a powerful and beginner-friendly library for PDF automation in Python. If your project involves document handling, reports, or PDF processing, PyPDF2 can save you hours of manual work.

📄 + 🐍 = Automation Magic

PyPDF2 Tutorial: Read, Merge, Split, and Encrypt PDFs Using Python

What Is PyPDF2?

Install PyPDF2

Read a PDF File Using PyPDF2

Extract Text from a PDF

Read All Pages from a PDF

Merge Multiple PDF Files

Split a PDF into Multiple Files

Rotate PDF Pages

Encrypt a PDF with Password

Decrypt a Password-Protected PDF

Real-World PyPDF2 Examples

Merge All PDFs in a Folder

Extract Text from All PDFs Automatically

Merge All PDFs from a Folder

🔍 What This Code Does (Simple Explanation)

Common Limitations of PyPDF2

Frequently Asked Questions (FAQs)

What is PyPDF2 used for?

Can PyPDF2 extract text from scanned PDFs?

Is PyPDF2 free?

Final Thoughts

🔗 Share this post