PyPDF2 is a popular Python library used to work with PDF files. With PyPDF2, you can read PDFs, extract text, merge multiple PDFs, split pages, rotate pages, and even add password protection.
If your project involves PDF automation, reports, invoices, or document processing, PyPDF2 is a must-learn library.
What Is PyPDF2?
PyPDF2 is a pure-Python PDF library that allows you to:
- Read PDF files
- Extract text
- Merge and split PDFs
- Rotate pages
- Encrypt and decrypt PDFs
📌 It works on Windows, Linux, and macOS.
Install PyPDF2
Install using pip:
pip install PyPDF2
Import the library:
from PyPDF2 import PdfReader, PdfWriter
Read a PDF File Using PyPDF2
from PyPDF2 import PdfReader
reader = PdfReader("sample.pdf")
print(len(reader.pages))
📌 Use case:
Counting pages in reports or documents.
Extract Text from a PDF
page = reader.pages[0] text = page.extract_text() print(text)
📌 Real-world use:
- Resume parsing
- Invoice data extraction
- Report analysis
⚠️ Text extraction depends on how the PDF was created.
Read All Pages from a PDF
for page in reader.pages:
print(page.extract_text())
Merge Multiple PDF Files
from PyPDF2 import PdfWriter
writer = PdfWriter()
for pdf in ["file1.pdf", "file2.pdf"]:
reader = PdfReader(pdf)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as f:
writer.write(f)
📌 Use case:
Combining reports, bills, or scanned documents.
Split a PDF into Multiple Files
reader = PdfReader("sample.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as f:
writer.write(f)
📌 Use case:
Splitting invoices or certificates page-wise.
Rotate PDF Pages
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90)
writer.add_page(page)
with open("rotated.pdf", "wb") as f:
writer.write(f)
Encrypt a PDF with Password
writer = PdfWriter()
writer.append_pages_from_reader(reader)
writer.encrypt("mypassword")
with open("protected.pdf", "wb") as f:
writer.write(f)
📌 Use case:
Securing confidential documents.
Decrypt a Password-Protected PDF
reader = PdfReader("protected.pdf")
reader.decrypt("mypassword")
Real-World PyPDF2 Examples
Merge All PDFs in a Folder
from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter
writer = PdfWriter()
for pdf in Path(".").glob("*.pdf"):
reader = PdfReader(pdf)
writer.append_pages_from_reader(reader)
with open("final.pdf", "wb") as f:
writer.write(f)
Extract Text from All PDFs Automatically
for pdf in Path(".").glob("*.pdf"):
reader = PdfReader(pdf)
for page in reader.pages:
print(page.extract_text())
Merge All PDFs from a Folder
Folder structure example
pdfs/
├── file1.pdf
├── file2.pdf
├── report.pdf
from pathlib import Path
from PyPDF2 import PdfReader, PdfWriter
pdf_folder = Path("pdfs") # folder containing PDFs
output_file = "merged.pdf"
writer = PdfWriter()
for pdf_path in sorted(pdf_folder.glob("*.pdf")):
reader = PdfReader(pdf_path)
for page in reader.pages:
writer.add_page(page)
with open(output_file, "wb") as f:
writer.write(f)
print("PDFs merged successfully!")
🔍 What This Code Does (Simple Explanation)
Path("pdfs")→ points to the folderglob("*.pdf")→ finds all PDF filessorted()→ merges in alphabetical orderPdfReader→ reads each PDFPdfWriter→ collects all pages- Writes everything into merged.pdf
Common Limitations of PyPDF2
❌ Not good for scanned PDFs (images)
❌ Layout formatting may be lost
❌ Slower for very large PDFs
📌 For scanned PDFs, use OCR tools like pytesseract.
Frequently Asked Questions (FAQs)
What is PyPDF2 used for?
PyPDF2 is used for reading, writing, merging, splitting, and encrypting PDF files in Python.
Can PyPDF2 extract text from scanned PDFs?
No. Scanned PDFs require OCR tools.
Is PyPDF2 free?
Yes, it is open-source and free to use.
Final Thoughts
PyPDF2 is a powerful and beginner-friendly library for PDF automation in Python. If your project involves document handling, reports, or PDF processing, PyPDF2 can save you hours of manual work.
📄 + 🐍 = Automation Magic