PDF OCR Tool icon

PDF OCR Tool

DocumentsNovember 2025

Turn scanned PDFs into searchable documents.

The Quick Version

uvx --from git+https://github.com/sameerbajaj/pdf-ocr pdf-ocr

Requires Tesseract and Poppler on your system.

The Problem

I collect old documents — research papers from the 1960s, scanned textbooks, journal articles someone uploaded as images. The content is valuable. But it's locked inside pictures.

Cmd+F doesn't work. Copy-paste doesn't work. Highlighting text for notes? Impossible. These PDFs are basically useless for how I actually use documents.

What This Does

Runs OCR on your PDF and creates an invisible text layer underneath the original images. The document looks exactly the same — but now you can search it, copy from it, and use it like a real PDF.

High-Quality OCR

Uses Tesseract 5.x for text recognition. Not perfect on handwriting, but surprisingly good on printed text, even from decades-old documents.

Invisible Text Layer

The original appearance stays intact. You're not replacing the scanned images — you're adding a searchable layer on top. The PDF still looks like the original.

Page Range

Process the whole document or just specific pages. Useful when you only care about one chapter of a 400-page book.

Adjustable DPI

Higher DPI means better accuracy but slower processing. Default is usually fine. Crank it up for tiny text or poor-quality scans.

Setup

You need Tesseract and Poppler installed first:

# macOS
brew install tesseract poppler

# Ubuntu/Debian
sudo apt install tesseract-ocr poppler-utils

Then run directly with uv:

uvx --from git+https://github.com/sameerbajaj/pdf-ocr pdf-ocr

Details in the README.

Sameer Bajaj © 2026
rssfacebooktwittergithubgitlabyoutubemailspotifylastfminstagramlinkedingooglegoogle-pluspinterestmediumvimeostackoverflowredditquoraquora