ocr‑vision manual select

Pipeline: tap-drag to mark each spine → rotate → count → OCR per spine.

1 · mark 2 · rotate 3 · count 4 · ocr

Upload image

camera off

Ready. Start camera or upload an image.

Notes & caveats

Why manual: auto-detection (OpenCV.js, COCO-SSD, etc.) is either too heavy on mobile (~9MB+ libs, multi-second WASM init) or too unreliable on cluttered scenes. A finger drag is faster and always picks the right thing.

How to mark: drag from the top of a spine to the bottom. The drag direction encodes the orientation, so the deskewed crop reads left-to-right automatically. Add multiple by dragging again.

Width slider: sets each new spine's width as a fraction of the drag length. Default 20% works for typical hardcovers.

Use full frame: creates a single spine = whole captured image. Fastest path when you've already framed the spine in the camera view.

Engines: Tesseract (fast/best), PaddleOCR (PP-OCRv4 + PP-OCRv5), TrOCR (printed/handwritten). Each loads on first Run; results cached per spine. Toggle Apply to all to batch.

Flip 180°: if results read upside-down, flip and re-run. Cached results dim to indicate they're stale.