ocr‑vision poc

Detect a book spine with an oriented bounding box (OpenCV) → deskew → run any OCR engine on demand.

Upload image

camera off

Ready. Start camera or upload an image.

Notes & caveats

Detection (new): classical CV via OpenCV.js — Canny edges, contour finding, minAreaRect for an oriented rectangle. Works for tilted spines. Frame the spine clearly; the largest elongated rectangle wins. ~9MB lazy-loaded on first capture.

Rotation: portrait crops are rotated 90° counter-clockwise so top-of-spine reads as left-of-text. Whether the title ends up right-side-up depends on how the book was shot — there's no way to tell without OCR. Use Flip 180° on the OCR input if the result is upside-down.

Tesseract Fast vs Best: same engine, different traineddata files (~11MB vs ~25MB).

PaddleOCR PP-OCRv4 via esearch-ocr with detector + recognizer ONNX models from paddleocr-browser. Loads onnxruntime-web.

PaddleOCR PP-OCRv5 via ppu-paddle-ocr — newer mobile model. Loads its own onnxruntime-web build internally.

TrOCR models are q8-quantized (~120MB each). Printed for typeset spines, handwritten for hand-lettered ones.

All client-side. Nothing leaves the device.