Added all

2026-04-23 16:20:37 +02:00
parent 3ca01dae8c
commit 243e5bad47
5 changed files with 500 additions and 579 deletions
--- a/README.md
+++ b/README.md
@@ -1,53 +1,185 @@
-# Manga Translator OCR Pipeline
+# 🎨 Manga Translator OCR Pipeline

-A robust manga/comic OCR + translation pipeline with:
-
- EasyOCR (default, reliable on macOS M1)
- Optional PaddleOCR (auto-fallback if unavailable)
- Bubble clustering and line-level boxes
- Robust reread pass (multi-preprocessing + slight rotation)
- Translation export + debug overlays
+An intelligent manga/comic OCR and translation pipeline designed for accurate text extraction and multi-language translation support. Optimized for macOS with Apple Silicon support.

 ---

-## ✨ Features
+## ✨ Key Features

- OCR from raw manga pages
- Noise filtering (`BOX` debug artifacts, tiny garbage tokens, symbols)
- Speech bubble grouping
- Reading order estimation (`ltr` / `rtl`)
- Translation output (`output.txt`)
- Structured bubble metadata (`bubbles.json`)
- Visual debug output (`debug_clusters.png`)
+- **Dual OCR Support**: EasyOCR (primary) with automatic fallback to PaddleOCR
+- **Smart Bubble Detection**: Advanced speech bubble clustering with line-level precision
+- **Robust Text Recognition**: Multi-pass preprocessing with rotation-based reread for accuracy
+- **Intelligent Noise Filtering**: Removes debug artifacts, garbage tokens, and unwanted symbols
+- **Reading Order Detection**: Automatic LTR/RTL detection for proper translation sequencing
+- **Multi-Language Translation**: Powered by Deep Translator
+- **Structured Output**: JSON metadata for bubble locations and properties
+- **Visual Debugging**: Detailed debug overlays for quality control
+- **Batch Processing**: Shell script support for processing multiple pages

 ---

-## 🧰 Requirements
+## 📋 Requirements

- macOS (Apple Silicon supported)
- Python **3.11** recommended
- Homebrew (for Python install)
+- **OS**: macOS (Apple Silicon M1/M2/M3 supported)
+- **Python**: 3.11+ (recommended 3.11.x)
+- **Package Manager**: Homebrew (for Python installation)
+- **Disk Space**: ~2-3GB for dependencies (OCR models, ML libraries)

 ---

-## 🚀 Setup (Python 3.11 venv)
+## 🚀 Quick Start
+
+### 1. **Create Virtual Environment**

 ```bash
 cd /path/to/manga-translator

-# 1) Create venv with 3.11
+# Create venv with Python 3.11
 /opt/homebrew/bin/python3.11 -m venv venv

-# 2) Activate
+# Activate environment
 source venv/bin/activate

-# 3) Verify interpreter
+# Verify correct Python version
 python -V
-# expected: Python 3.11.x
+# Expected output: Python 3.11.x
+```

-# 4) Install dependencies
+### 2. **Install Dependencies**
+
+```bash
+# Upgrade pip and build tools
 python -m pip install --upgrade pip setuptools wheel
+
+# Install required packages
 python -m pip install -r requirements.txt

-# Optional Paddle runtime
+# Optional: Install PaddleOCR fallback
 python -m pip install paddlepaddle || true
+```
+
+### 3. **Prepare Your Manga**
+
+Place manga page images in a directory (e.g., `your-manga-series/`)
+
+---
+
+## 📖 Usage
+
+### Single Page Translation
+
+```bash
+python manga-translator.py --input path/to/page.png --output output_dir/
+```
+
+### Batch Processing Multiple Pages
+
+```bash
+bash batch-translate.sh input_folder/ output_folder/
+```
+
+### Generate Rendered Output
+
+```bash
+python manga-renderer.py --bubbles bubbles.json --original input.png --output rendered.png
+```
+
+---
+
+## 📂 Project Structure
+
+```
+manga-translator/
+├── manga-translator.py       # Main OCR + translation pipeline
+├── manga-renderer.py         # Visualization & debug rendering
+├── batch-translate.sh        # Batch processing script
+├── requirements.txt          # Python dependencies
+│
+├── fonts/                    # Custom fonts for rendering
+├── pages-for-tests/          # Test data
+│   └── translated/           # Sample outputs
+│
+├── Dandadan_059/             # Sample manga series
+├── Spy_x_Family_076/         # Sample manga series
+│
+└── older-code/               # Legacy scripts & experiments
+```
+
+---
+
+## 📤 Output Files
+
+For each processed page, the pipeline generates:
+
+- **`bubbles.json`** – Structured metadata with bubble coordinates, text, and properties
+- **`output.txt`** – Translated text in reading order
+- **`debug_clusters.png`** – Visual overlay showing detected bubbles and processing
+- **`rendered_output.png`** – Final rendered manga with translations overlaid
+
+---
+
+## 🔧 Configuration
+
+Key processing parameters (adjustable in `manga-translator.py`):
+
+- **OCR Engine**: EasyOCR with auto-fallback to Manga-OCR
+- **Bubble Clustering**: Adaptive threshold-based grouping
+- **Text Preprocessing**: Multi-pass noise reduction and enhancement
+- **Translation Target**: Configurable language (default: English)
+
+---
+
+## 🐛 Troubleshooting
+
+### "ModuleNotFoundError" Errors
+
+```bash
+# Ensure venv is activated
+source venv/bin/activate
+
+# Reinstall dependencies
+python -m pip install -r requirements.txt --force-reinstall
+```
+
+### OCR Accuracy Issues
+
+- Ensure images are high quality (300+ DPI recommended)
+- Check that manga is not rotated
+- Try adjusting clustering parameters in the code
+
+### Out of Memory Errors
+
+- Process pages in smaller batches
+- Reduce image resolution before processing
+- Check available RAM: `vm_stat` on macOS
+
+### Translation Issues
+
+- Verify internet connection (translations require API calls)
+- Check language codes in Deep Translator documentation
+- Test with a single page first
+
+---
+
+## 🛠️ Development
+
+### Running Tests
+
+Test data is available in `pages-for-tests/translated/`
+
+```bash
+python manga-translator.py --input pages-for-tests/example.png --output test-output/
+```
+
+### Debugging
+
+Enable verbose output by modifying the logging level in `manga-translator.py`
+
+---
+
+## 📝 Notes
+
+- Processing time: ~10-30 seconds per page (varies by image size and hardware)
+- ML models are downloaded automatically on first run
+- GPU acceleration available with compatible CUDA setup (optional)
+- Tested on macOS 13+ with Python 3.11
--- a/batch-translate.sh
+++ b/batch-translate.sh
@@ -0,0 +1,269 @@
+#!/usr/bin/env bash
+# ============================================================
+# batch-translate.sh
+# Batch manga OCR + translation for all images in a folder.
+#
+# Usage:
+#   ./batch-translate.sh <folder>
+#   ./batch-translate.sh <folder> --source en --target es
+#   ./batch-translate.sh <folder> --start 3 --end 7
+#   ./batch-translate.sh <folder> -s en -t fr --start 2
+#
+# Output per page lands in:
+#   <folder>/translated/<page_stem>/
+#     ├── bubbles.json
+#     ├── output.txt
+#     └── debug_clusters.png
+# ============================================================
+
+set -uo pipefail
+
+# ─────────────────────────────────────────────────────────────
+# CONFIGURATION
+# ─────────────────────────────────────────────────────────────
+SOURCE_LANG="en"
+TARGET_LANG="ca"
+START_PAGE=1
+END_PAGE=999999
+PYTHON_BIN="python"
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+TRANSLATOR="${SCRIPT_DIR}/manga-translator.py"
+
+# ─────────────────────────────────────────────────────────────
+# COLOURS
+# ─────────────────────────────────────────────────────────────
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+CYAN='\033[0;36m'
+BOLD='\033[1m'
+RESET='\033[0m'
+
+# ─────────────────────────────────────────────────────────────
+# HELPERS
+# ─────────────────────────────────────────────────────────────
+usage() {
+    echo ""
+    echo -e "${BOLD}Usage:${RESET}"
+    echo "  $0 <folder> [options]"
+    echo ""
+    echo -e "${BOLD}Options:${RESET}"
+    echo "  --source,  -s   Source language code  (default: en)"
+    echo "  --target,  -t   Target language code  (default: ca)"
+    echo "  --start         First page number     (default: 1)"
+    echo "  --end           Last  page number     (default: all)"
+    echo "  --python        Python binary         (default: python)"
+    echo "  --help,    -h   Show this help"
+    echo ""
+    echo -e "${BOLD}Examples:${RESET}"
+    echo "  $0 pages-for-tests"
+    echo "  $0 pages-for-tests --source en --target es"
+    echo "  $0 pages-for-tests --start 3 --end 7"
+    echo "  $0 pages-for-tests -s en -t fr --start 2"
+    echo ""
+}
+
+log_info()    { echo -e "${CYAN}ℹ️  $*${RESET}"; }
+log_ok()      { echo -e "${GREEN}✅  $*${RESET}"; }
+log_warn()    { echo -e "${YELLOW}⚠️  $*${RESET}"; }
+log_error()   { echo -e "${RED}❌  $*${RESET}"; }
+log_section() {
+    echo -e "\n${BOLD}${CYAN}══════════════════════════════════════════${RESET}"
+    echo -e "${BOLD}${CYAN}  📖  $*${RESET}"
+    echo -e "${BOLD}${CYAN}══════════════════════════════════════════${RESET}"
+}
+
+# ─────────────────────────────────────────────────────────────
+# ARGUMENT PARSING
+# ─────────────────────────────────────────────────────────────
+if [[ $# -eq 0 ]]; then
+    log_error "No folder specified."
+    usage
+    exit 1
+fi
+
+FOLDER="$1"
+shift
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --source|-s)  SOURCE_LANG="$2"; shift 2 ;;
+        --target|-t)  TARGET_LANG="$2"; shift 2 ;;
+        --start)      START_PAGE="$2";  shift 2 ;;
+        --end)        END_PAGE="$2";    shift 2 ;;
+        --python)     PYTHON_BIN="$2";  shift 2 ;;
+        --help|-h)    usage; exit 0 ;;
+        *)
+            log_error "Unknown option: $1"
+            usage
+            exit 1
+            ;;
+    esac
+done
+
+# ─────────────────────────────────────────────────────────────
+# VALIDATION
+# ─────────────────────────────────────────────────────────────
+if [[ ! -d "$FOLDER" ]]; then
+    log_error "Folder not found: $FOLDER"
+    exit 1
+fi
+
+if [[ ! -f "$TRANSLATOR" ]]; then
+    log_error "manga-translator.py not found at: $TRANSLATOR"
+    exit 1
+fi
+
+if ! command -v "$PYTHON_BIN" &>/dev/null; then
+    log_error "Python binary not found: $PYTHON_BIN"
+    log_error "Try --python python3"
+    exit 1
+fi
+
+# ─────────────────────────────────────────────────────────────
+# PURGE BYTECODE CACHE
+# ─────────────────────────────────────────────────────────────
+log_info "🗑️  Purging Python bytecode caches..."
+find "${SCRIPT_DIR}" -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
+log_ok "Cache cleared."
+
+# ─────────────────────────────────────────────────────────────
+# DISCOVER IMAGES
+# NOTE: uses while-read loop instead of mapfile for Bash 3.2
+#       compatibility (macOS default shell)
+# ─────────────────────────────────────────────────────────────
+ALL_IMAGES=()
+while IFS= read -r -d '' img; do
+    ALL_IMAGES+=("$img")
+done < <(
+    find "$FOLDER" -maxdepth 1 -type f \
+        \( -iname "*.jpg" -o -iname "*.jpeg" \
+           -o -iname "*.png" -o -iname "*.webp" \) \
+        -print0 | sort -z
+)
+
+TOTAL=${#ALL_IMAGES[@]}
+
+if [[ $TOTAL -eq 0 ]]; then
+    log_error "No image files found in: $FOLDER"
+    exit 1
+fi
+
+# ─────────────────────────────────────────────────────────────
+# SLICE TO REQUESTED PAGE RANGE (1-based)
+# ─────────────────────────────────────────────────────────────
+PAGES=()
+for i in "${!ALL_IMAGES[@]}"; do
+    PAGE_NUM=$(( i + 1 ))
+    if [[ $PAGE_NUM -ge $START_PAGE && $PAGE_NUM -le $END_PAGE ]]; then
+        PAGES+=("${ALL_IMAGES[$i]}")
+    fi
+done
+
+if [[ ${#PAGES[@]} -eq 0 ]]; then
+    log_error "No pages in range [${START_PAGE}, ${END_PAGE}] (total: ${TOTAL})"
+    exit 1
+fi
+
+# ─────────────────────────────────────────────────────────────
+# SUMMARY HEADER
+# ─────────────────────────────────────────────────────────────
+log_section "BATCH MANGA TRANSLATOR"
+log_info "📂  Folder   : $(realpath "$FOLDER")"
+log_info "📄  Pages    : ${#PAGES[@]} of ${TOTAL} total"
+log_info "🔢  Range    : ${START_PAGE} → ${END_PAGE}"
+log_info "🌐  Source   : ${SOURCE_LANG}"
+log_info "🎯  Target   : ${TARGET_LANG}"
+log_info "💾  Output   : ${FOLDER}/translated/<page>/"
+echo ""
+
+# ─────────────────────────────────────────────────────────────
+# PROCESS EACH PAGE
+# ─────────────────────────────────────────────────────────────
+PASS=0
+FAIL=0
+FAIL_LIST=()
+
+for i in "${!PAGES[@]}"; do
+    IMAGE="${PAGES[$i]}"
+    PAGE_NUM=$(( START_PAGE + i ))
+    STEM="$(basename "${IMAGE%.*}")"
+    WORKDIR="${FOLDER}/translated/${STEM}"
+
+    echo ""
+    echo -e "${BOLD}──────────────────────────────────────────${RESET}"
+    echo -e "${BOLD}  🖼️  [${PAGE_NUM}/${TOTAL}]  ${STEM}${RESET}"
+    echo -e "${BOLD}──────────────────────────────────────────${RESET}"
+
+    mkdir -p "$WORKDIR"
+
+    OUTPUT_JSON="${WORKDIR}/bubbles.json"
+    OUTPUT_TXT="${WORKDIR}/output.txt"
+    OUTPUT_DEBUG="${WORKDIR}/debug_clusters.png"
+
+    log_info "🗂️  Image : $(basename "$IMAGE")"
+    log_info "📁  Out   : ${WORKDIR}"
+
+    # ── Run the translator ────────────────────────────────────
+    if "$PYTHON_BIN" "$TRANSLATOR" \
+            "$IMAGE" \
+            --source "$SOURCE_LANG" \
+            --target "$TARGET_LANG" \
+            --json   "$OUTPUT_JSON" \
+            --txt    "$OUTPUT_TXT" \
+            --debug  "$OUTPUT_DEBUG"; then
+
+        # Verify outputs exist and are non-empty
+        MISSING=0
+        for FNAME in "bubbles.json" "output.txt"; do
+            FPATH="${WORKDIR}/${FNAME}"
+            if [[ ! -f "$FPATH" || ! -s "$FPATH" ]]; then
+                log_warn "${FNAME} is missing or empty."
+                MISSING=$(( MISSING + 1 ))
+            else
+                SIZE=$(wc -c < "$FPATH" | tr -d ' ')
+                log_ok "${FNAME} → ${SIZE} bytes"
+            fi
+        done
+
+        if [[ -f "$OUTPUT_DEBUG" ]]; then
+            log_ok "debug_clusters.png written."
+        fi
+
+        if [[ $MISSING -eq 0 ]]; then
+            log_ok "Page ${PAGE_NUM} complete."
+            PASS=$(( PASS + 1 ))
+        else
+            log_warn "Page ${PAGE_NUM} finished with warnings."
+            FAIL=$(( FAIL + 1 ))
+            FAIL_LIST+=("${STEM}")
+        fi
+
+    else
+        log_error "Page ${PAGE_NUM} FAILED — check output above."
+        FAIL=$(( FAIL + 1 ))
+        FAIL_LIST+=("${STEM}")
+    fi
+
+done
+
+# ─────────────────────────────────────────────────────────────
+# FINAL SUMMARY
+# ─────────────────────────────────────────────────────────────
+log_section "BATCH COMPLETE"
+echo -e "  ✅  ${GREEN}Passed : ${PASS}${RESET}"
+echo -e "  ❌  ${RED}Failed : ${FAIL}${RESET}"
+
+if [[ ${#FAIL_LIST[@]} -gt 0 ]]; then
+    echo ""
+    log_warn "Failed pages:"
+    for NAME in "${FAIL_LIST[@]}"; do
+        echo -e "    ❌  ${RED}${NAME}${RESET}"
+    done
+fi
+
+echo ""
+log_info "📦  Output folder: $(realpath "${FOLDER}/translated")"
+echo ""
+
+[[ $FAIL -eq 0 ]] && exit 0 || exit 1
--- a/manga-translator.py
+++ b/manga-translator.py
@@ -47,7 +47,6 @@ SHORT_ENGLISH_WORDS_2 = {
 # Combined protected set used by is_meaningful_text()
 SHORT_ENGLISH_PROTECTED = SHORT_ENGLISH_WORDS_1 | SHORT_ENGLISH_WORDS_2

-
 DIALOGUE_STOPWORDS = {
    "I", "YOU", "HE", "SHE", "WE", "THEY", "IT", "ME", "MY", "YOUR", "OUR",
    "IS", "ARE", "WAS", "WERE", "AM", "DO", "DID", "DON'T", "DIDN'T", "NOT",
@@ -55,6 +54,38 @@ DIALOGUE_STOPWORDS = {
    "AND", "BUT", "SO", "THAT", "THIS", "THERE", "HERE", "THAN", "ALL", "RIGHT"
 }

+PROTECTED_SHORT_TOKENS = {
+    # ... existing entries ...
+    "HUH", "HUH?", "HUH??", "HUH?!",
+    "OH", "OH!", "OOH", "OOH!",
+    "AH", "AH!", "UH", "UH...",
+    "HEY", "HEY!", "EH", "EH?",
+    "WOW", "WOW!",
+    "MORNING", "MORNING.",
+    "BECKY", "BECKY!",
+    "DAMIAN", "CECILE", "WALD",
+    "OMIGOSH", "EEEP", "EEEEP",
+    # FIX: common short words that appear alone on a manga line
+    "GOOD", "WELL", "YEAH", "OKAY", "SURE",
+    "WAIT", "STOP", "LOOK", "COME", "BACK",
+    "HERE", "OVER", "JUST", "EVEN", "ONLY",
+    "ALSO", "THEN", "WHEN", "WHAT", "THAT",
+    "THIS", "WITH", "FROM", "HAVE", "WILL",
+}
+
+_MANGA_INTERJECTIONS = {
+    # ... existing entries ...
+    # FIX: short words that appear isolated on their own OCR line
+    'GOOD', 'WELL', 'YEAH', 'OKAY', 'SURE',
+    'WAIT', 'STOP', 'LOOK', 'COME', 'BACK',
+    'HERE', 'OVER', 'JUST', 'EVEN', 'ONLY',
+    'ALSO', 'THEN', 'WHEN', 'WHAT', 'THAT',
+    'THIS', 'WITH', 'FROM', 'HAVE', 'WILL',
+    'TRUE', 'REAL', 'FINE', 'DONE', 'GONE',
+    'HELP', 'MOVE', 'STAY', 'CALM', 'COOL',
+}
+
+
 # FIX: SFX_HINTS contains ONLY pure onomatopoeia — no words
 # that could appear in dialogue (MORNING, GOOD, etc. removed)
 SFX_HINTS = {
@@ -520,10 +551,39 @@ def postprocess_translation_general(text: str) -> str:

 def fix_common_ocr_errors(text: str) -> str:
    result = text
+
+    # existing fixes
    result = re.sub(r'(\d)O(\d)', r'\g<1>0\g<2>', result)
    result = re.sub(r'(\d)O([^a-zA-Z])', r'\g<1>0\g<2>', result)
    result = result.replace('|', 'I')
    result = result.replace('`', "'")
+
+    # FIX: Replace digit-zero used as letter-O in common English words.
+    # Vision OCR sometimes reads O → 0 in bold/stylised manga fonts.
+    # Pattern: word containing digits that look like letters.
+    DIGIT_AS_LETTER = {
+        '0': 'O',
+        '1': 'I',
+        '3': 'E',
+        '4': 'A',
+        '5': 'S',
+        '8': 'B',
+    }
+
+    # Only apply inside tokens that are otherwise all-alpha
+    # e.g. "G00D" → "GOOD", "M0RNING" → "MORNING"
+    def fix_digit_letters(m):
+        word = m.group(0)
+        fixed = word
+        for digit, letter in DIGIT_AS_LETTER.items():
+            fixed = fixed.replace(digit, letter)
+        # Only accept the fix if the result is all-alpha (real word)
+        if fixed.isalpha():
+            return fixed
+        return word
+
+    result = re.sub(r'\b[A-Za-z0-9]{2,12}\b', fix_digit_letters, result)
+
    return result

 def is_valid_language(text: str, source_lang: str) -> bool:
@@ -1173,15 +1233,24 @@ def ocr_candidate_score(text: str) -> float:
    n = len(t)
    if n == 0:
        return 0.0
+
    alpha    = sum(c.isalpha() for c in t) / n
    spaces   = sum(c.isspace() for c in t) / n
    punct_ok = sum(c in ".,!?'-:;()[]\"¡¿" for c in t) / n
    bad      = len(re.findall(r"[^\w\s\.\,\!\?\-\'\:\;\(\)\[\]\"¡¿]", t)) / n
-    penalty  = 0.0
-    if re.search(r"\b[A-Z]\b", t):
+
+    penalty = 0.0
+
+    # FIX: Only penalise isolated single letters when the WHOLE token
+    # is a single letter — not when a word like "I" or "A" appears
+    # inside a longer sentence. Old pattern \b[A-Z]\b fired on "I"
+    # inside "I CAN'T" which incorrectly penalised valid dialogue.
+    if re.fullmatch(r"[A-Z]", t.strip()):
        penalty += 0.05
+
    if re.search(r"[0-9]{2,}", t):
        penalty += 0.08
+
    score = (0.62 * alpha) + (0.10 * spaces) + (0.20 * punct_ok) - (0.45 * bad) - penalty
    return max(0.0, min(1.0, score))

--- a/pipeline-render.py
+++ b/pipeline-render.py
@@ -1,159 +0,0 @@
-#!/usr/bin/env python3
-"""
-pipeline_render.py
-───────────────────────────────────────────────────────────────
-Standalone Rendering Pipeline
-
-Usage:
-  python pipeline-render.py /path/to/chapter/folder
-"""
-
-import os
-import sys
-import argparse
-import zipfile
-import importlib.util
-from pathlib import Path
-import cv2  # ✅ Added OpenCV to load the image
-
-# ─────────────────────────────────────────────
-#  CONFIG
-# ─────────────────────────────────────────────
-DEFAULT_FONT_PATH = "fonts/ComicNeue-Regular.ttf"
-
-# ─────────────────────────────────────────────
-#  DYNAMIC MODULE LOADER
-# ─────────────────────────────────────────────
-def load_module(name, filepath):
-    spec = importlib.util.spec_from_file_location(name, filepath)
-    if spec is None or spec.loader is None:
-        raise FileNotFoundError(f"Cannot load spec for {filepath}")
-    module = importlib.util.module_from_spec(spec)
-    spec.loader.exec_module(module)
-    return module
-
-# ─────────────────────────────────────────────
-#  HELPERS
-# ─────────────────────────────────────────────
-def sorted_pages(chapter_dir):
-    exts = {".jpg", ".jpeg", ".png", ".webp"}
-    pages = [
-        p for p in Path(chapter_dir).iterdir()
-        if p.is_file() and p.suffix.lower() in exts
-    ]
-    return sorted(pages, key=lambda p: p.stem)
-
-def pack_rendered_cbz(chapter_dir, output_cbz, rendered_files):
-    if not rendered_files:
-        print("⚠️  No rendered pages found — CBZ not created.")
-        return
-
-    with zipfile.ZipFile(output_cbz, "w", compression=zipfile.ZIP_STORED) as zf:
-        for rp in rendered_files:
-            arcname = rp.name
-            zf.write(rp, arcname)
-
-    print(f"\n✅ Rendered CBZ saved → {output_cbz}")
-    print(f"📦 Contains: {len(rendered_files)} translated pages ready to read.")
-
-# ─────────────────────────────────────────────
-#  PER-PAGE PIPELINE
-# ─────────────────────────────────────────────
-def process_render(page_path, workdir, renderer_module, font_path):
-    print(f"\n{'─' * 70}")
-    print(f"🎨 RENDERING: {page_path.name}")
-    print(f"{'─' * 70}")
-
-    txt_path = workdir / "output.txt"
-    json_path = workdir / "bubbles.json"
-    out_img = workdir / page_path.name
-
-    if not txt_path.exists() or not json_path.exists():
-        print("  ⚠️ Missing output.txt or bubbles.json. Did you run the OCR pipeline first?")
-        return None
-
-    # ✅ FIX: Load the image into memory (as a NumPy array) before passing it
-    img_array = cv2.imread(str(page_path.resolve()))
-    if img_array is None:
-        print(f"  ❌ Failed to load image: {page_path.name}")
-        return None
-
-    orig_dir = os.getcwd()
-    try:
-        os.chdir(workdir)
-
-        # Pass the loaded image array instead of the string path
-        renderer_module.render_translations(
-            img_array,                      # 1st arg: Image Data (NumPy array)
-            str(out_img.resolve()),         # 2nd arg: Output image path
-            str(txt_path.resolve()),        # 3rd arg: Translations text
-            str(json_path.resolve()),       # 4th arg: Bubbles JSON
-            font_path                       # 5th arg: Font Path
-        )
-        print("  ✅ Render complete")
-        return out_img
-
-    except Exception as e:
-        print(f"  ❌ Failed: {e}")
-        return None
-
-    finally:
-        os.chdir(orig_dir)
-
-# ─────────────────────────────────────────────
-#  MAIN
-# ─────────────────────────────────────────────
-def main():
-    parser = argparse.ArgumentParser(description="Manga Rendering Pipeline")
-    parser.add_argument("chapter_dir", help="Path to the folder containing original manga pages")
-    args = parser.parse_args()
-
-    chapter_dir = Path(args.chapter_dir).resolve()
-    output_cbz = chapter_dir.parent / f"{chapter_dir.name}_rendered.cbz"
-
-    script_dir = Path(__file__).parent
-    absolute_font_path = str((script_dir / DEFAULT_FONT_PATH).resolve())
-
-    print("Loading renderer module...")
-    try:
-        renderer = load_module("manga_renderer", str(script_dir / "manga-renderer.py"))
-    except Exception as e:
-        print(f"❌ Could not load manga-renderer.py: {e}")
-        sys.exit(1)
-
-    pages = sorted_pages(chapter_dir)
-    if not pages:
-        print(f"❌ No images found in: {chapter_dir}")
-        sys.exit(1)
-
-    print(f"\n📖 Chapter : {chapter_dir}")
-    print(f"   Pages   : {len(pages)}\n")
-
-    succeeded, failed = [], []
-    rendered_files = []
-
-    for i, page_path in enumerate(pages, start=1):
-        print(f"[{i}/{len(pages)}] Checking data for {page_path.name}...")
-        workdir = Path(chapter_dir) / "translated" / page_path.stem
-        
-        out_file = process_render(page_path, workdir, renderer, absolute_font_path)
-        if out_file:
-            succeeded.append(page_path.name)
-            rendered_files.append(out_file)
-        else:
-            failed.append(page_path.name)
-
-    print(f"\n{'═' * 70}")
-    print("RENDER PIPELINE COMPLETE")
-    print(f"✅ {len(succeeded)} page(s) rendered successfully")
-    if failed:
-        print(f"❌ {len(failed)} page(s) skipped or failed:")
-        for f in failed:
-            print(f"   • {f}")
-    print(f"{'═' * 70}\n")
-
-    print("Packing final CBZ...")
-    pack_rendered_cbz(chapter_dir, output_cbz, rendered_files)
-
-if __name__ == "__main__":
-    main()
--- a/pipeline-translator.py
+++ b/pipeline-translator.py
@@ -1,390 +0,0 @@
-#!/usr/bin/env python3
-"""
-pipeline-translator.py
-───────────────────────────────────────────────────────────────
-Translation OCR pipeline (Batch Processing Only)
-
-Usage:
-  python pipeline-translator.py /path/to/chapter/folder
-  python pipeline-translator.py /path/to/chapter/folder --start 2 --end 5
-  python pipeline-translator.py /path/to/chapter/folder --source en --target es
-"""
-
-import os
-import sys
-import argparse
-import importlib.util
-from pathlib import Path
-
-
-# ─────────────────────────────────────────────────────────────
-#  PIPELINE CONFIGURATION
-#  Maps to the process_manga_page() signature in manga-translator.py
-# ─────────────────────────────────────────────────────────────
-PIPELINE_CONFIG = dict(
-    source_lang = "en",
-    target_lang = "ca",
-)
-
-
-# ─────────────────────────────────────────────────────────────
-#  DYNAMIC MODULE LOADER
-#  FIX: Always evicts stale sys.modules entry and deletes
-#       __pycache__ for manga-translator.py before loading,
-#       so edits are ALWAYS picked up on every run.
-# ─────────────────────────────────────────────────────────────
-def purge_bytecode_cache(filepath: str) -> None:
-    """
-    Delete the compiled .pyc file for the given .py path so Python
-    cannot silently use a stale cached version of the module.
-    """
-    import py_compile
-    from importlib.util import cache_from_source
-
-    try:
-        pyc_path = cache_from_source(filepath)
-        if os.path.exists(pyc_path):
-            os.remove(pyc_path)
-            print(f"🗑️  Purged bytecode cache: {pyc_path}")
-    except Exception as e:
-        # Non-fatal — just warn and continue
-        print(f"⚠️  Could not purge bytecode cache: {e}")
-
-
-def load_module(name: str, filepath: str):
-    """
-    Dynamically load a .py file as a module.
-
-    FIX 1: Purge the .pyc cache so edits are always reflected.
-    FIX 2: Evict any previously loaded version from sys.modules
-            to prevent Python reusing a stale module object across
-            multiple calls (e.g. when running in a REPL or test loop).
-    """
-    # FIX 1: delete stale bytecode
-    purge_bytecode_cache(filepath)
-
-    # FIX 2: evict from module registry
-    if name in sys.modules:
-        del sys.modules[name]
-
-    spec = importlib.util.spec_from_file_location(name, filepath)
-    if spec is None or spec.loader is None:
-        raise FileNotFoundError(f"Cannot load module spec for: {filepath}")
-
-    module = importlib.util.module_from_spec(spec)
-    sys.modules[name] = module          # register before exec (handles self-refs)
-    spec.loader.exec_module(module)
-    return module
-
-
-# ─────────────────────────────────────────────────────────────
-#  HELPERS
-# ─────────────────────────────────────────────────────────────
-def sorted_pages(chapter_dir: Path):
-    """Return all image files in chapter_dir sorted by filename stem."""
-    exts = {".jpg", ".jpeg", ".png", ".webp"}
-    pages = [
-        p for p in chapter_dir.iterdir()
-        if p.is_file() and p.suffix.lower() in exts
-    ]
-    return sorted(pages, key=lambda p: p.stem)
-
-
-def make_page_workdir(chapter_dir: Path, page_stem: str) -> Path:
-    """Create and return translated/<page_stem>/ inside chapter_dir."""
-    workdir = chapter_dir / "translated" / page_stem
-    workdir.mkdir(parents=True, exist_ok=True)
-    return workdir
-
-
-def verify_translator_api(module) -> bool:
-    """
-    Checks that the loaded module exposes process_manga_page() and
-    that it accepts all keys defined in PIPELINE_CONFIG.
-    Prints a clear warning for any missing parameter.
-    """
-    import inspect
-
-    fn = getattr(module, "process_manga_page", None)
-    if fn is None:
-        print("❌ manga-translator.py does not expose process_manga_page()")
-        return False
-
-    sig    = inspect.signature(fn)
-    params = set(sig.parameters.keys())
-    ok     = True
-
-    for key in PIPELINE_CONFIG:
-        if key not in params:
-            print(
-                f"⚠️  PIPELINE_CONFIG key '{key}' not found in "
-                f"process_manga_page() — update pipeline or translator."
-            )
-            ok = False
-
-    return ok
-
-
-def sanity_check_fixes(module_path: Path) -> None:
-    """
-    Grep the translator source for key fix signatures and warn if
-    any are missing. Helps catch cases where an edit was not saved.
-    """
-    checks = {
-        "Fix A (gap_factor=4.0)":               "gap_factor=4.0",
-        "Fix B (_majority_contour_id)":          "_majority_contour_id",
-        "Fix C (median_inter adaptive gap)":     "median_inter",
-        "Fix D (merge_same_column_dialogue)":    "merge_same_column_dialogue_boxes",
-        "Fix E (lang_code from self.langs)":     "lang_code = self.langs",
-    }
-
-    print("\n🔎 Sanity-checking fixes in manga-translator.py:")
-    source = module_path.read_text(encoding="utf-8")
-    all_ok = True
-
-    for label, token in checks.items():
-        found = token in source
-        status = "✅" if found else "❌ MISSING"
-        print(f"   {status}  {label}")
-        if not found:
-            all_ok = False
-
-    if not all_ok:
-        print(
-            "\n⚠️  One or more fixes are missing from manga-translator.py.\n"
-            "   Save the file and re-run. Aborting.\n"
-        )
-        sys.exit(1)
-    else:
-        print("   All fixes present.\n")
-
-
-# ─────────────────────────────────────────────────────────────
-#  PER-PAGE PIPELINE
-# ─────────────────────────────────────────────────────────────
-def process_page(page_path: Path, workdir: Path, translator_module) -> bool:
-    print(f"\n{'─' * 70}")
-    print(f"  PAGE : {page_path.name}")
-    print(f"  OUT  : {workdir}")
-    print(f"{'─' * 70}")
-
-    orig_dir = os.getcwd()
-    try:
-        os.chdir(workdir)
-
-        # Use absolute paths so output always lands in workdir
-        # regardless of any internal os.getcwd() calls.
-        output_json = str(workdir / "bubbles.json")
-        output_txt  = str(workdir / "output.txt")
-        debug_path  = str(workdir / "debug_clusters.png")
-
-        print("  ⏳ Extracting text and translating...")
-
-        results = translator_module.process_manga_page(
-            image_path  = str(page_path.resolve()),
-            output_json = output_json,
-            output_txt  = output_txt,
-            **PIPELINE_CONFIG,
-        )
-
-        # ── Debug visualisation ───────────────────────────────
-        # FIX: process_manga_page() already writes debug_clusters.png
-        # internally with full OCR quad data.
-        # We do NOT call draw_debug_clusters() here with ocr=[]
-        # because that would OVERWRITE the correct debug image with
-        # a degraded version that has no quad outlines.
-        #
-        # If process_manga_page() did not write a debug image
-        # (e.g. older version), we do a minimal fallback draw.
-        if results and not os.path.exists(debug_path):
-            try:
-                import cv2
-                image_bgr = cv2.imread(str(page_path.resolve()))
-                if image_bgr is not None:
-                    vis_boxes:   dict = {}
-                    vis_lines:   dict = {}
-                    vis_indices: dict = {}
-
-                    for bid_str, data in results.items():
-                        bid  = int(bid_str)
-                        xywh = data["box"]
-                        vis_boxes[bid] = (
-                            xywh["x"],
-                            xywh["y"],
-                            xywh["x"] + xywh["w"],
-                            xywh["y"] + xywh["h"],
-                        )
-                        vis_lines[bid]   = data.get("lines", [])
-                        vis_indices[bid] = []
-
-                    # Fallback only — ocr=[] means no quad outlines
-                    translator_module.draw_debug_clusters(
-                        image_bgr   = image_bgr,
-                        out_boxes   = vis_boxes,
-                        out_lines   = vis_lines,
-                        out_indices = vis_indices,
-                        ocr         = [],
-                        save_path   = debug_path,
-                    )
-                    print(f"  🖼️  Fallback debug image written → {debug_path}")
-            except Exception as e:
-                print(f"  ⚠️  Debug visualisation failed (non-fatal): {e}")
-
-        # ── Sanity-check output files ─────────────────────────
-        all_good = True
-        for fname in ("output.txt", "bubbles.json"):
-            fpath = workdir / fname
-            if not fpath.exists():
-                print(f"  ⚠️  {fname} was NOT created.")
-                all_good = False
-            elif fpath.stat().st_size == 0:
-                print(f"  ⚠️  {fname} exists but is EMPTY.")
-                all_good = False
-            else:
-                print(f"  📄 {fname} → {fpath.stat().st_size} bytes")
-
-        if not results:
-            print("  ⚠️  process_manga_page() returned no results.")
-            return False
-
-        print(f"  ✅ Done — {len(results)} box(es) processed.")
-        return True
-
-    except Exception as e:
-        import traceback
-        print(f"  ❌ Failed: {e}")
-        traceback.print_exc()
-        return False
-
-    finally:
-        os.chdir(orig_dir)
-
-
-# ─────────────────────────────────────────────────────────────
-#  MAIN
-# ─────────────────────────────────────────────────────────────
-def main():
-    parser = argparse.ArgumentParser(
-        description="Manga Translation OCR Batch Pipeline",
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog="""
-Examples:
-  python pipeline-translator.py pages-for-tests
-  python pipeline-translator.py pages-for-tests --start 2 --end 4
-  python pipeline-translator.py pages-for-tests --source en --target es
-        """
-    )
-    parser.add_argument(
-        "chapter_dir",
-        help="Path to the folder containing manga page images"
-    )
-    parser.add_argument(
-        "--start", type=int, default=1,
-        help="Start from this page number (1-based, default: 1)"
-    )
-    parser.add_argument(
-        "--end", type=int, default=None,
-        help="Stop after this page number inclusive (default: all)"
-    )
-    parser.add_argument(
-        "--source", "-s", default=None,
-        help=f"Override source language (default: {PIPELINE_CONFIG['source_lang']})"
-    )
-    parser.add_argument(
-        "--target", "-t", default=None,
-        help=f"Override target language (default: {PIPELINE_CONFIG['target_lang']})"
-    )
-    parser.add_argument(
-        "--skip-sanity", action="store_true",
-        help="Skip the fix sanity check (not recommended)"
-    )
-    args = parser.parse_args()
-
-    # ── Apply CLI language overrides ─────────────────────────
-    config = dict(PIPELINE_CONFIG)
-    if args.source:
-        config["source_lang"] = args.source
-    if args.target:
-        config["target_lang"] = args.target
-    PIPELINE_CONFIG.update(config)
-
-    # ── Resolve chapter directory ─────────────────────────────
-    chapter_dir = Path(args.chapter_dir).resolve()
-    if not chapter_dir.is_dir():
-        print(f"❌ Not a directory: {chapter_dir}")
-        sys.exit(1)
-
-    # ── Locate manga-translator.py ────────────────────────────
-    script_dir  = Path(__file__).parent
-    module_path = script_dir / "manga-translator.py"
-
-    if not module_path.exists():
-        print(f"❌ manga-translator.py not found in {script_dir}")
-        sys.exit(1)
-
-    # ── Sanity-check that all fixes are present ───────────────
-    if not args.skip_sanity:
-        sanity_check_fixes(module_path)
-
-    # ── Load translator module ────────────────────────────────
-    print(f"📦 Loading translator from: {module_path}")
-    try:
-        translator = load_module("manga_translator", str(module_path))
-    except Exception as e:
-        print(f"❌ Could not load manga-translator.py: {e}")
-        sys.exit(1)
-
-    # ── API compatibility check ───────────────────────────────
-    if not verify_translator_api(translator):
-        print("❌ Aborting — fix the parameter mismatch above first.")
-        sys.exit(1)
-
-    # ── Discover and slice pages ──────────────────────────────
-    all_pages = sorted_pages(chapter_dir)
-    if not all_pages:
-        print(f"❌ No image files found in: {chapter_dir}")
-        sys.exit(1)
-
-    start_idx = max(0, args.start - 1)
-    end_idx   = args.end if args.end is not None else len(all_pages)
-    pages     = all_pages[start_idx:end_idx]
-
-    if not pages:
-        print(f"❌ No pages in range [{args.start}, {args.end}]")
-        sys.exit(1)
-
-    print(f"\n📚 Chapter  : {chapter_dir.name}")
-    print(f"   Pages    : {len(pages)} of {len(all_pages)} total")
-    print(f"   Source   : {PIPELINE_CONFIG['source_lang']}")
-    print(f"   Target   : {PIPELINE_CONFIG['target_lang']}")
-    print(f"   Output   : {chapter_dir / 'translated'}\n")
-
-    # ── Process each page ─────────────────────────────────────
-    results_summary = []
-
-    for page_num, page_path in enumerate(pages, start=start_idx + 1):
-        workdir = make_page_workdir(chapter_dir, page_path.stem)
-        success = process_page(page_path, workdir, translator)
-        results_summary.append((page_num, page_path.name, success))
-
-    # ── Final summary ─────────────────────────────────────────
-    print(f"\n{'═' * 70}")
-    print(f"  BATCH COMPLETE")
-    print(f"{'═' * 70}")
-
-    passed = sum(1 for _, _, ok in results_summary if ok)
-    failed = len(results_summary) - passed
-
-    for page_num, name, ok in results_summary:
-        status = "✅" if ok else "❌"
-        print(f"  {status}  [{page_num:>3}]  {name}")
-
-    print(f"\n  Total: {passed} succeeded, {failed} failed")
-    print(f"{'═' * 70}\n")
-
-    if failed:
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    main()