# Manga Translator OCR Pipeline A robust manga/comic OCR + translation pipeline with: - EasyOCR (default, reliable on macOS M1) - Optional PaddleOCR (auto-fallback if unavailable) - Bubble clustering and line-level boxes - Robust reread pass (multi-preprocessing + slight rotation) - Translation export + debug overlays --- ## ✨ Features - OCR from raw manga pages - Noise filtering (`BOX` debug artifacts, tiny garbage tokens, symbols) - Speech bubble grouping - Reading order estimation (`ltr` / `rtl`) - Translation output (`output.txt`) - Structured bubble metadata (`bubbles.json`) - Visual debug output (`debug_clusters.png`) --- ## 🧰 Requirements - macOS (Apple Silicon supported) - Python **3.11** recommended - Homebrew (for Python install) --- ## 🚀 Setup (Python 3.11 venv) ```bash cd /path/to/manga-translator # 1) Create venv with 3.11 /opt/homebrew/bin/python3.11 -m venv venv # 2) Activate source venv/bin/activate # 3) Verify interpreter python -V # expected: Python 3.11.x # 4) Install dependencies python -m pip install --upgrade pip setuptools wheel python -m pip install -r requirements.txt # Optional Paddle runtime python -m pip install paddlepaddle || true