f56ee49abf129988b1824d5ace24a359928829c8
Manga Translator OCR Pipeline
A robust manga/comic OCR + translation pipeline with:
- EasyOCR (default, reliable on macOS M1)
- Optional PaddleOCR (auto-fallback if unavailable)
- Bubble clustering and line-level boxes
- Robust reread pass (multi-preprocessing + slight rotation)
- Translation export + debug overlays
✨ Features
- OCR from raw manga pages
- Noise filtering (
BOXdebug artifacts, tiny garbage tokens, symbols) - Speech bubble grouping
- Reading order estimation (
ltr/rtl) - Translation output (
output.txt) - Structured bubble metadata (
bubbles.json) - Visual debug output (
debug_clusters.png)
🧰 Requirements
- macOS (Apple Silicon supported)
- Python 3.11 recommended
- Homebrew (for Python install)
🚀 Setup (Python 3.11 venv)
cd /path/to/manga-translator
# 1) Create venv with 3.11
/opt/homebrew/bin/python3.11 -m venv venv
# 2) Activate
source venv/bin/activate
# 3) Verify interpreter
python -V
# expected: Python 3.11.x
# 4) Install dependencies
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
# Optional Paddle runtime
python -m pip install paddlepaddle || true
Description
Languages
Python
100%