Added all
This commit is contained in:
186
README.md
186
README.md
@@ -1,53 +1,185 @@
|
||||
# Manga Translator OCR Pipeline
|
||||
# 🎨 Manga Translator OCR Pipeline
|
||||
|
||||
A robust manga/comic OCR + translation pipeline with:
|
||||
|
||||
- EasyOCR (default, reliable on macOS M1)
|
||||
- Optional PaddleOCR (auto-fallback if unavailable)
|
||||
- Bubble clustering and line-level boxes
|
||||
- Robust reread pass (multi-preprocessing + slight rotation)
|
||||
- Translation export + debug overlays
|
||||
An intelligent manga/comic OCR and translation pipeline designed for accurate text extraction and multi-language translation support. Optimized for macOS with Apple Silicon support.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
## ✨ Key Features
|
||||
|
||||
- OCR from raw manga pages
|
||||
- Noise filtering (`BOX` debug artifacts, tiny garbage tokens, symbols)
|
||||
- Speech bubble grouping
|
||||
- Reading order estimation (`ltr` / `rtl`)
|
||||
- Translation output (`output.txt`)
|
||||
- Structured bubble metadata (`bubbles.json`)
|
||||
- Visual debug output (`debug_clusters.png`)
|
||||
- **Dual OCR Support**: EasyOCR (primary) with automatic fallback to PaddleOCR
|
||||
- **Smart Bubble Detection**: Advanced speech bubble clustering with line-level precision
|
||||
- **Robust Text Recognition**: Multi-pass preprocessing with rotation-based reread for accuracy
|
||||
- **Intelligent Noise Filtering**: Removes debug artifacts, garbage tokens, and unwanted symbols
|
||||
- **Reading Order Detection**: Automatic LTR/RTL detection for proper translation sequencing
|
||||
- **Multi-Language Translation**: Powered by Deep Translator
|
||||
- **Structured Output**: JSON metadata for bubble locations and properties
|
||||
- **Visual Debugging**: Detailed debug overlays for quality control
|
||||
- **Batch Processing**: Shell script support for processing multiple pages
|
||||
|
||||
---
|
||||
|
||||
## 🧰 Requirements
|
||||
## 📋 Requirements
|
||||
|
||||
- macOS (Apple Silicon supported)
|
||||
- Python **3.11** recommended
|
||||
- Homebrew (for Python install)
|
||||
- **OS**: macOS (Apple Silicon M1/M2/M3 supported)
|
||||
- **Python**: 3.11+ (recommended 3.11.x)
|
||||
- **Package Manager**: Homebrew (for Python installation)
|
||||
- **Disk Space**: ~2-3GB for dependencies (OCR models, ML libraries)
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Setup (Python 3.11 venv)
|
||||
## 🚀 Quick Start
|
||||
|
||||
### 1. **Create Virtual Environment**
|
||||
|
||||
```bash
|
||||
cd /path/to/manga-translator
|
||||
|
||||
# 1) Create venv with 3.11
|
||||
# Create venv with Python 3.11
|
||||
/opt/homebrew/bin/python3.11 -m venv venv
|
||||
|
||||
# 2) Activate
|
||||
# Activate environment
|
||||
source venv/bin/activate
|
||||
|
||||
# 3) Verify interpreter
|
||||
# Verify correct Python version
|
||||
python -V
|
||||
# expected: Python 3.11.x
|
||||
# Expected output: Python 3.11.x
|
||||
```
|
||||
|
||||
# 4) Install dependencies
|
||||
### 2. **Install Dependencies**
|
||||
|
||||
```bash
|
||||
# Upgrade pip and build tools
|
||||
python -m pip install --upgrade pip setuptools wheel
|
||||
|
||||
# Install required packages
|
||||
python -m pip install -r requirements.txt
|
||||
|
||||
# Optional Paddle runtime
|
||||
# Optional: Install PaddleOCR fallback
|
||||
python -m pip install paddlepaddle || true
|
||||
```
|
||||
|
||||
### 3. **Prepare Your Manga**
|
||||
|
||||
Place manga page images in a directory (e.g., `your-manga-series/`)
|
||||
|
||||
---
|
||||
|
||||
## 📖 Usage
|
||||
|
||||
### Single Page Translation
|
||||
|
||||
```bash
|
||||
python manga-translator.py --input path/to/page.png --output output_dir/
|
||||
```
|
||||
|
||||
### Batch Processing Multiple Pages
|
||||
|
||||
```bash
|
||||
bash batch-translate.sh input_folder/ output_folder/
|
||||
```
|
||||
|
||||
### Generate Rendered Output
|
||||
|
||||
```bash
|
||||
python manga-renderer.py --bubbles bubbles.json --original input.png --output rendered.png
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📂 Project Structure
|
||||
|
||||
```
|
||||
manga-translator/
|
||||
├── manga-translator.py # Main OCR + translation pipeline
|
||||
├── manga-renderer.py # Visualization & debug rendering
|
||||
├── batch-translate.sh # Batch processing script
|
||||
├── requirements.txt # Python dependencies
|
||||
│
|
||||
├── fonts/ # Custom fonts for rendering
|
||||
├── pages-for-tests/ # Test data
|
||||
│ └── translated/ # Sample outputs
|
||||
│
|
||||
├── Dandadan_059/ # Sample manga series
|
||||
├── Spy_x_Family_076/ # Sample manga series
|
||||
│
|
||||
└── older-code/ # Legacy scripts & experiments
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📤 Output Files
|
||||
|
||||
For each processed page, the pipeline generates:
|
||||
|
||||
- **`bubbles.json`** – Structured metadata with bubble coordinates, text, and properties
|
||||
- **`output.txt`** – Translated text in reading order
|
||||
- **`debug_clusters.png`** – Visual overlay showing detected bubbles and processing
|
||||
- **`rendered_output.png`** – Final rendered manga with translations overlaid
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
Key processing parameters (adjustable in `manga-translator.py`):
|
||||
|
||||
- **OCR Engine**: EasyOCR with auto-fallback to Manga-OCR
|
||||
- **Bubble Clustering**: Adaptive threshold-based grouping
|
||||
- **Text Preprocessing**: Multi-pass noise reduction and enhancement
|
||||
- **Translation Target**: Configurable language (default: English)
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### "ModuleNotFoundError" Errors
|
||||
|
||||
```bash
|
||||
# Ensure venv is activated
|
||||
source venv/bin/activate
|
||||
|
||||
# Reinstall dependencies
|
||||
python -m pip install -r requirements.txt --force-reinstall
|
||||
```
|
||||
|
||||
### OCR Accuracy Issues
|
||||
|
||||
- Ensure images are high quality (300+ DPI recommended)
|
||||
- Check that manga is not rotated
|
||||
- Try adjusting clustering parameters in the code
|
||||
|
||||
### Out of Memory Errors
|
||||
|
||||
- Process pages in smaller batches
|
||||
- Reduce image resolution before processing
|
||||
- Check available RAM: `vm_stat` on macOS
|
||||
|
||||
### Translation Issues
|
||||
|
||||
- Verify internet connection (translations require API calls)
|
||||
- Check language codes in Deep Translator documentation
|
||||
- Test with a single page first
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
Test data is available in `pages-for-tests/translated/`
|
||||
|
||||
```bash
|
||||
python manga-translator.py --input pages-for-tests/example.png --output test-output/
|
||||
```
|
||||
|
||||
### Debugging
|
||||
|
||||
Enable verbose output by modifying the logging level in `manga-translator.py`
|
||||
|
||||
---
|
||||
|
||||
## 📝 Notes
|
||||
|
||||
- Processing time: ~10-30 seconds per page (varies by image size and hardware)
|
||||
- ML models are downloaded automatically on first run
|
||||
- GPU acceleration available with compatible CUDA setup (optional)
|
||||
- Tested on macOS 13+ with Python 3.11
|
||||
|
||||
269
batch-translate.sh
Executable file
269
batch-translate.sh
Executable file
@@ -0,0 +1,269 @@
|
||||
#!/usr/bin/env bash
|
||||
# ============================================================
|
||||
# batch-translate.sh
|
||||
# Batch manga OCR + translation for all images in a folder.
|
||||
#
|
||||
# Usage:
|
||||
# ./batch-translate.sh <folder>
|
||||
# ./batch-translate.sh <folder> --source en --target es
|
||||
# ./batch-translate.sh <folder> --start 3 --end 7
|
||||
# ./batch-translate.sh <folder> -s en -t fr --start 2
|
||||
#
|
||||
# Output per page lands in:
|
||||
# <folder>/translated/<page_stem>/
|
||||
# ├── bubbles.json
|
||||
# ├── output.txt
|
||||
# └── debug_clusters.png
|
||||
# ============================================================
|
||||
|
||||
set -uo pipefail
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# CONFIGURATION
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
SOURCE_LANG="en"
|
||||
TARGET_LANG="ca"
|
||||
START_PAGE=1
|
||||
END_PAGE=999999
|
||||
PYTHON_BIN="python"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
TRANSLATOR="${SCRIPT_DIR}/manga-translator.py"
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# COLOURS
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
CYAN='\033[0;36m'
|
||||
BOLD='\033[1m'
|
||||
RESET='\033[0m'
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# HELPERS
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
usage() {
|
||||
echo ""
|
||||
echo -e "${BOLD}Usage:${RESET}"
|
||||
echo " $0 <folder> [options]"
|
||||
echo ""
|
||||
echo -e "${BOLD}Options:${RESET}"
|
||||
echo " --source, -s Source language code (default: en)"
|
||||
echo " --target, -t Target language code (default: ca)"
|
||||
echo " --start First page number (default: 1)"
|
||||
echo " --end Last page number (default: all)"
|
||||
echo " --python Python binary (default: python)"
|
||||
echo " --help, -h Show this help"
|
||||
echo ""
|
||||
echo -e "${BOLD}Examples:${RESET}"
|
||||
echo " $0 pages-for-tests"
|
||||
echo " $0 pages-for-tests --source en --target es"
|
||||
echo " $0 pages-for-tests --start 3 --end 7"
|
||||
echo " $0 pages-for-tests -s en -t fr --start 2"
|
||||
echo ""
|
||||
}
|
||||
|
||||
log_info() { echo -e "${CYAN}ℹ️ $*${RESET}"; }
|
||||
log_ok() { echo -e "${GREEN}✅ $*${RESET}"; }
|
||||
log_warn() { echo -e "${YELLOW}⚠️ $*${RESET}"; }
|
||||
log_error() { echo -e "${RED}❌ $*${RESET}"; }
|
||||
log_section() {
|
||||
echo -e "\n${BOLD}${CYAN}══════════════════════════════════════════${RESET}"
|
||||
echo -e "${BOLD}${CYAN} 📖 $*${RESET}"
|
||||
echo -e "${BOLD}${CYAN}══════════════════════════════════════════${RESET}"
|
||||
}
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# ARGUMENT PARSING
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
if [[ $# -eq 0 ]]; then
|
||||
log_error "No folder specified."
|
||||
usage
|
||||
exit 1
|
||||
fi
|
||||
|
||||
FOLDER="$1"
|
||||
shift
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--source|-s) SOURCE_LANG="$2"; shift 2 ;;
|
||||
--target|-t) TARGET_LANG="$2"; shift 2 ;;
|
||||
--start) START_PAGE="$2"; shift 2 ;;
|
||||
--end) END_PAGE="$2"; shift 2 ;;
|
||||
--python) PYTHON_BIN="$2"; shift 2 ;;
|
||||
--help|-h) usage; exit 0 ;;
|
||||
*)
|
||||
log_error "Unknown option: $1"
|
||||
usage
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# VALIDATION
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
if [[ ! -d "$FOLDER" ]]; then
|
||||
log_error "Folder not found: $FOLDER"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -f "$TRANSLATOR" ]]; then
|
||||
log_error "manga-translator.py not found at: $TRANSLATOR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if ! command -v "$PYTHON_BIN" &>/dev/null; then
|
||||
log_error "Python binary not found: $PYTHON_BIN"
|
||||
log_error "Try --python python3"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# PURGE BYTECODE CACHE
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
log_info "🗑️ Purging Python bytecode caches..."
|
||||
find "${SCRIPT_DIR}" -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
|
||||
log_ok "Cache cleared."
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# DISCOVER IMAGES
|
||||
# NOTE: uses while-read loop instead of mapfile for Bash 3.2
|
||||
# compatibility (macOS default shell)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
ALL_IMAGES=()
|
||||
while IFS= read -r -d '' img; do
|
||||
ALL_IMAGES+=("$img")
|
||||
done < <(
|
||||
find "$FOLDER" -maxdepth 1 -type f \
|
||||
\( -iname "*.jpg" -o -iname "*.jpeg" \
|
||||
-o -iname "*.png" -o -iname "*.webp" \) \
|
||||
-print0 | sort -z
|
||||
)
|
||||
|
||||
TOTAL=${#ALL_IMAGES[@]}
|
||||
|
||||
if [[ $TOTAL -eq 0 ]]; then
|
||||
log_error "No image files found in: $FOLDER"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# SLICE TO REQUESTED PAGE RANGE (1-based)
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
PAGES=()
|
||||
for i in "${!ALL_IMAGES[@]}"; do
|
||||
PAGE_NUM=$(( i + 1 ))
|
||||
if [[ $PAGE_NUM -ge $START_PAGE && $PAGE_NUM -le $END_PAGE ]]; then
|
||||
PAGES+=("${ALL_IMAGES[$i]}")
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#PAGES[@]} -eq 0 ]]; then
|
||||
log_error "No pages in range [${START_PAGE}, ${END_PAGE}] (total: ${TOTAL})"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# SUMMARY HEADER
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
log_section "BATCH MANGA TRANSLATOR"
|
||||
log_info "📂 Folder : $(realpath "$FOLDER")"
|
||||
log_info "📄 Pages : ${#PAGES[@]} of ${TOTAL} total"
|
||||
log_info "🔢 Range : ${START_PAGE} → ${END_PAGE}"
|
||||
log_info "🌐 Source : ${SOURCE_LANG}"
|
||||
log_info "🎯 Target : ${TARGET_LANG}"
|
||||
log_info "💾 Output : ${FOLDER}/translated/<page>/"
|
||||
echo ""
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# PROCESS EACH PAGE
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
PASS=0
|
||||
FAIL=0
|
||||
FAIL_LIST=()
|
||||
|
||||
for i in "${!PAGES[@]}"; do
|
||||
IMAGE="${PAGES[$i]}"
|
||||
PAGE_NUM=$(( START_PAGE + i ))
|
||||
STEM="$(basename "${IMAGE%.*}")"
|
||||
WORKDIR="${FOLDER}/translated/${STEM}"
|
||||
|
||||
echo ""
|
||||
echo -e "${BOLD}──────────────────────────────────────────${RESET}"
|
||||
echo -e "${BOLD} 🖼️ [${PAGE_NUM}/${TOTAL}] ${STEM}${RESET}"
|
||||
echo -e "${BOLD}──────────────────────────────────────────${RESET}"
|
||||
|
||||
mkdir -p "$WORKDIR"
|
||||
|
||||
OUTPUT_JSON="${WORKDIR}/bubbles.json"
|
||||
OUTPUT_TXT="${WORKDIR}/output.txt"
|
||||
OUTPUT_DEBUG="${WORKDIR}/debug_clusters.png"
|
||||
|
||||
log_info "🗂️ Image : $(basename "$IMAGE")"
|
||||
log_info "📁 Out : ${WORKDIR}"
|
||||
|
||||
# ── Run the translator ────────────────────────────────────
|
||||
if "$PYTHON_BIN" "$TRANSLATOR" \
|
||||
"$IMAGE" \
|
||||
--source "$SOURCE_LANG" \
|
||||
--target "$TARGET_LANG" \
|
||||
--json "$OUTPUT_JSON" \
|
||||
--txt "$OUTPUT_TXT" \
|
||||
--debug "$OUTPUT_DEBUG"; then
|
||||
|
||||
# Verify outputs exist and are non-empty
|
||||
MISSING=0
|
||||
for FNAME in "bubbles.json" "output.txt"; do
|
||||
FPATH="${WORKDIR}/${FNAME}"
|
||||
if [[ ! -f "$FPATH" || ! -s "$FPATH" ]]; then
|
||||
log_warn "${FNAME} is missing or empty."
|
||||
MISSING=$(( MISSING + 1 ))
|
||||
else
|
||||
SIZE=$(wc -c < "$FPATH" | tr -d ' ')
|
||||
log_ok "${FNAME} → ${SIZE} bytes"
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ -f "$OUTPUT_DEBUG" ]]; then
|
||||
log_ok "debug_clusters.png written."
|
||||
fi
|
||||
|
||||
if [[ $MISSING -eq 0 ]]; then
|
||||
log_ok "Page ${PAGE_NUM} complete."
|
||||
PASS=$(( PASS + 1 ))
|
||||
else
|
||||
log_warn "Page ${PAGE_NUM} finished with warnings."
|
||||
FAIL=$(( FAIL + 1 ))
|
||||
FAIL_LIST+=("${STEM}")
|
||||
fi
|
||||
|
||||
else
|
||||
log_error "Page ${PAGE_NUM} FAILED — check output above."
|
||||
FAIL=$(( FAIL + 1 ))
|
||||
FAIL_LIST+=("${STEM}")
|
||||
fi
|
||||
|
||||
done
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# FINAL SUMMARY
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
log_section "BATCH COMPLETE"
|
||||
echo -e " ✅ ${GREEN}Passed : ${PASS}${RESET}"
|
||||
echo -e " ❌ ${RED}Failed : ${FAIL}${RESET}"
|
||||
|
||||
if [[ ${#FAIL_LIST[@]} -gt 0 ]]; then
|
||||
echo ""
|
||||
log_warn "Failed pages:"
|
||||
for NAME in "${FAIL_LIST[@]}"; do
|
||||
echo -e " ❌ ${RED}${NAME}${RESET}"
|
||||
done
|
||||
fi
|
||||
|
||||
echo ""
|
||||
log_info "📦 Output folder: $(realpath "${FOLDER}/translated")"
|
||||
echo ""
|
||||
|
||||
[[ $FAIL -eq 0 ]] && exit 0 || exit 1
|
||||
@@ -47,7 +47,6 @@ SHORT_ENGLISH_WORDS_2 = {
|
||||
# Combined protected set used by is_meaningful_text()
|
||||
SHORT_ENGLISH_PROTECTED = SHORT_ENGLISH_WORDS_1 | SHORT_ENGLISH_WORDS_2
|
||||
|
||||
|
||||
DIALOGUE_STOPWORDS = {
|
||||
"I", "YOU", "HE", "SHE", "WE", "THEY", "IT", "ME", "MY", "YOUR", "OUR",
|
||||
"IS", "ARE", "WAS", "WERE", "AM", "DO", "DID", "DON'T", "DIDN'T", "NOT",
|
||||
@@ -55,6 +54,38 @@ DIALOGUE_STOPWORDS = {
|
||||
"AND", "BUT", "SO", "THAT", "THIS", "THERE", "HERE", "THAN", "ALL", "RIGHT"
|
||||
}
|
||||
|
||||
PROTECTED_SHORT_TOKENS = {
|
||||
# ... existing entries ...
|
||||
"HUH", "HUH?", "HUH??", "HUH?!",
|
||||
"OH", "OH!", "OOH", "OOH!",
|
||||
"AH", "AH!", "UH", "UH...",
|
||||
"HEY", "HEY!", "EH", "EH?",
|
||||
"WOW", "WOW!",
|
||||
"MORNING", "MORNING.",
|
||||
"BECKY", "BECKY!",
|
||||
"DAMIAN", "CECILE", "WALD",
|
||||
"OMIGOSH", "EEEP", "EEEEP",
|
||||
# FIX: common short words that appear alone on a manga line
|
||||
"GOOD", "WELL", "YEAH", "OKAY", "SURE",
|
||||
"WAIT", "STOP", "LOOK", "COME", "BACK",
|
||||
"HERE", "OVER", "JUST", "EVEN", "ONLY",
|
||||
"ALSO", "THEN", "WHEN", "WHAT", "THAT",
|
||||
"THIS", "WITH", "FROM", "HAVE", "WILL",
|
||||
}
|
||||
|
||||
_MANGA_INTERJECTIONS = {
|
||||
# ... existing entries ...
|
||||
# FIX: short words that appear isolated on their own OCR line
|
||||
'GOOD', 'WELL', 'YEAH', 'OKAY', 'SURE',
|
||||
'WAIT', 'STOP', 'LOOK', 'COME', 'BACK',
|
||||
'HERE', 'OVER', 'JUST', 'EVEN', 'ONLY',
|
||||
'ALSO', 'THEN', 'WHEN', 'WHAT', 'THAT',
|
||||
'THIS', 'WITH', 'FROM', 'HAVE', 'WILL',
|
||||
'TRUE', 'REAL', 'FINE', 'DONE', 'GONE',
|
||||
'HELP', 'MOVE', 'STAY', 'CALM', 'COOL',
|
||||
}
|
||||
|
||||
|
||||
# FIX: SFX_HINTS contains ONLY pure onomatopoeia — no words
|
||||
# that could appear in dialogue (MORNING, GOOD, etc. removed)
|
||||
SFX_HINTS = {
|
||||
@@ -520,10 +551,39 @@ def postprocess_translation_general(text: str) -> str:
|
||||
|
||||
def fix_common_ocr_errors(text: str) -> str:
|
||||
result = text
|
||||
|
||||
# existing fixes
|
||||
result = re.sub(r'(\d)O(\d)', r'\g<1>0\g<2>', result)
|
||||
result = re.sub(r'(\d)O([^a-zA-Z])', r'\g<1>0\g<2>', result)
|
||||
result = result.replace('|', 'I')
|
||||
result = result.replace('`', "'")
|
||||
|
||||
# FIX: Replace digit-zero used as letter-O in common English words.
|
||||
# Vision OCR sometimes reads O → 0 in bold/stylised manga fonts.
|
||||
# Pattern: word containing digits that look like letters.
|
||||
DIGIT_AS_LETTER = {
|
||||
'0': 'O',
|
||||
'1': 'I',
|
||||
'3': 'E',
|
||||
'4': 'A',
|
||||
'5': 'S',
|
||||
'8': 'B',
|
||||
}
|
||||
|
||||
# Only apply inside tokens that are otherwise all-alpha
|
||||
# e.g. "G00D" → "GOOD", "M0RNING" → "MORNING"
|
||||
def fix_digit_letters(m):
|
||||
word = m.group(0)
|
||||
fixed = word
|
||||
for digit, letter in DIGIT_AS_LETTER.items():
|
||||
fixed = fixed.replace(digit, letter)
|
||||
# Only accept the fix if the result is all-alpha (real word)
|
||||
if fixed.isalpha():
|
||||
return fixed
|
||||
return word
|
||||
|
||||
result = re.sub(r'\b[A-Za-z0-9]{2,12}\b', fix_digit_letters, result)
|
||||
|
||||
return result
|
||||
|
||||
def is_valid_language(text: str, source_lang: str) -> bool:
|
||||
@@ -1173,15 +1233,24 @@ def ocr_candidate_score(text: str) -> float:
|
||||
n = len(t)
|
||||
if n == 0:
|
||||
return 0.0
|
||||
|
||||
alpha = sum(c.isalpha() for c in t) / n
|
||||
spaces = sum(c.isspace() for c in t) / n
|
||||
punct_ok = sum(c in ".,!?'-:;()[]\"¡¿" for c in t) / n
|
||||
bad = len(re.findall(r"[^\w\s\.\,\!\?\-\'\:\;\(\)\[\]\"¡¿]", t)) / n
|
||||
|
||||
penalty = 0.0
|
||||
if re.search(r"\b[A-Z]\b", t):
|
||||
|
||||
# FIX: Only penalise isolated single letters when the WHOLE token
|
||||
# is a single letter — not when a word like "I" or "A" appears
|
||||
# inside a longer sentence. Old pattern \b[A-Z]\b fired on "I"
|
||||
# inside "I CAN'T" which incorrectly penalised valid dialogue.
|
||||
if re.fullmatch(r"[A-Z]", t.strip()):
|
||||
penalty += 0.05
|
||||
|
||||
if re.search(r"[0-9]{2,}", t):
|
||||
penalty += 0.08
|
||||
|
||||
score = (0.62 * alpha) + (0.10 * spaces) + (0.20 * punct_ok) - (0.45 * bad) - penalty
|
||||
return max(0.0, min(1.0, score))
|
||||
|
||||
|
||||
@@ -1,159 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pipeline_render.py
|
||||
───────────────────────────────────────────────────────────────
|
||||
Standalone Rendering Pipeline
|
||||
|
||||
Usage:
|
||||
python pipeline-render.py /path/to/chapter/folder
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import zipfile
|
||||
import importlib.util
|
||||
from pathlib import Path
|
||||
import cv2 # ✅ Added OpenCV to load the image
|
||||
|
||||
# ─────────────────────────────────────────────
|
||||
# CONFIG
|
||||
# ─────────────────────────────────────────────
|
||||
DEFAULT_FONT_PATH = "fonts/ComicNeue-Regular.ttf"
|
||||
|
||||
# ─────────────────────────────────────────────
|
||||
# DYNAMIC MODULE LOADER
|
||||
# ─────────────────────────────────────────────
|
||||
def load_module(name, filepath):
|
||||
spec = importlib.util.spec_from_file_location(name, filepath)
|
||||
if spec is None or spec.loader is None:
|
||||
raise FileNotFoundError(f"Cannot load spec for {filepath}")
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
# ─────────────────────────────────────────────
|
||||
# HELPERS
|
||||
# ─────────────────────────────────────────────
|
||||
def sorted_pages(chapter_dir):
|
||||
exts = {".jpg", ".jpeg", ".png", ".webp"}
|
||||
pages = [
|
||||
p for p in Path(chapter_dir).iterdir()
|
||||
if p.is_file() and p.suffix.lower() in exts
|
||||
]
|
||||
return sorted(pages, key=lambda p: p.stem)
|
||||
|
||||
def pack_rendered_cbz(chapter_dir, output_cbz, rendered_files):
|
||||
if not rendered_files:
|
||||
print("⚠️ No rendered pages found — CBZ not created.")
|
||||
return
|
||||
|
||||
with zipfile.ZipFile(output_cbz, "w", compression=zipfile.ZIP_STORED) as zf:
|
||||
for rp in rendered_files:
|
||||
arcname = rp.name
|
||||
zf.write(rp, arcname)
|
||||
|
||||
print(f"\n✅ Rendered CBZ saved → {output_cbz}")
|
||||
print(f"📦 Contains: {len(rendered_files)} translated pages ready to read.")
|
||||
|
||||
# ─────────────────────────────────────────────
|
||||
# PER-PAGE PIPELINE
|
||||
# ─────────────────────────────────────────────
|
||||
def process_render(page_path, workdir, renderer_module, font_path):
|
||||
print(f"\n{'─' * 70}")
|
||||
print(f"🎨 RENDERING: {page_path.name}")
|
||||
print(f"{'─' * 70}")
|
||||
|
||||
txt_path = workdir / "output.txt"
|
||||
json_path = workdir / "bubbles.json"
|
||||
out_img = workdir / page_path.name
|
||||
|
||||
if not txt_path.exists() or not json_path.exists():
|
||||
print(" ⚠️ Missing output.txt or bubbles.json. Did you run the OCR pipeline first?")
|
||||
return None
|
||||
|
||||
# ✅ FIX: Load the image into memory (as a NumPy array) before passing it
|
||||
img_array = cv2.imread(str(page_path.resolve()))
|
||||
if img_array is None:
|
||||
print(f" ❌ Failed to load image: {page_path.name}")
|
||||
return None
|
||||
|
||||
orig_dir = os.getcwd()
|
||||
try:
|
||||
os.chdir(workdir)
|
||||
|
||||
# Pass the loaded image array instead of the string path
|
||||
renderer_module.render_translations(
|
||||
img_array, # 1st arg: Image Data (NumPy array)
|
||||
str(out_img.resolve()), # 2nd arg: Output image path
|
||||
str(txt_path.resolve()), # 3rd arg: Translations text
|
||||
str(json_path.resolve()), # 4th arg: Bubbles JSON
|
||||
font_path # 5th arg: Font Path
|
||||
)
|
||||
print(" ✅ Render complete")
|
||||
return out_img
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Failed: {e}")
|
||||
return None
|
||||
|
||||
finally:
|
||||
os.chdir(orig_dir)
|
||||
|
||||
# ─────────────────────────────────────────────
|
||||
# MAIN
|
||||
# ─────────────────────────────────────────────
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Manga Rendering Pipeline")
|
||||
parser.add_argument("chapter_dir", help="Path to the folder containing original manga pages")
|
||||
args = parser.parse_args()
|
||||
|
||||
chapter_dir = Path(args.chapter_dir).resolve()
|
||||
output_cbz = chapter_dir.parent / f"{chapter_dir.name}_rendered.cbz"
|
||||
|
||||
script_dir = Path(__file__).parent
|
||||
absolute_font_path = str((script_dir / DEFAULT_FONT_PATH).resolve())
|
||||
|
||||
print("Loading renderer module...")
|
||||
try:
|
||||
renderer = load_module("manga_renderer", str(script_dir / "manga-renderer.py"))
|
||||
except Exception as e:
|
||||
print(f"❌ Could not load manga-renderer.py: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
pages = sorted_pages(chapter_dir)
|
||||
if not pages:
|
||||
print(f"❌ No images found in: {chapter_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"\n📖 Chapter : {chapter_dir}")
|
||||
print(f" Pages : {len(pages)}\n")
|
||||
|
||||
succeeded, failed = [], []
|
||||
rendered_files = []
|
||||
|
||||
for i, page_path in enumerate(pages, start=1):
|
||||
print(f"[{i}/{len(pages)}] Checking data for {page_path.name}...")
|
||||
workdir = Path(chapter_dir) / "translated" / page_path.stem
|
||||
|
||||
out_file = process_render(page_path, workdir, renderer, absolute_font_path)
|
||||
if out_file:
|
||||
succeeded.append(page_path.name)
|
||||
rendered_files.append(out_file)
|
||||
else:
|
||||
failed.append(page_path.name)
|
||||
|
||||
print(f"\n{'═' * 70}")
|
||||
print("RENDER PIPELINE COMPLETE")
|
||||
print(f"✅ {len(succeeded)} page(s) rendered successfully")
|
||||
if failed:
|
||||
print(f"❌ {len(failed)} page(s) skipped or failed:")
|
||||
for f in failed:
|
||||
print(f" • {f}")
|
||||
print(f"{'═' * 70}\n")
|
||||
|
||||
print("Packing final CBZ...")
|
||||
pack_rendered_cbz(chapter_dir, output_cbz, rendered_files)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,390 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pipeline-translator.py
|
||||
───────────────────────────────────────────────────────────────
|
||||
Translation OCR pipeline (Batch Processing Only)
|
||||
|
||||
Usage:
|
||||
python pipeline-translator.py /path/to/chapter/folder
|
||||
python pipeline-translator.py /path/to/chapter/folder --start 2 --end 5
|
||||
python pipeline-translator.py /path/to/chapter/folder --source en --target es
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import importlib.util
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# PIPELINE CONFIGURATION
|
||||
# Maps to the process_manga_page() signature in manga-translator.py
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
PIPELINE_CONFIG = dict(
|
||||
source_lang = "en",
|
||||
target_lang = "ca",
|
||||
)
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# DYNAMIC MODULE LOADER
|
||||
# FIX: Always evicts stale sys.modules entry and deletes
|
||||
# __pycache__ for manga-translator.py before loading,
|
||||
# so edits are ALWAYS picked up on every run.
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
def purge_bytecode_cache(filepath: str) -> None:
|
||||
"""
|
||||
Delete the compiled .pyc file for the given .py path so Python
|
||||
cannot silently use a stale cached version of the module.
|
||||
"""
|
||||
import py_compile
|
||||
from importlib.util import cache_from_source
|
||||
|
||||
try:
|
||||
pyc_path = cache_from_source(filepath)
|
||||
if os.path.exists(pyc_path):
|
||||
os.remove(pyc_path)
|
||||
print(f"🗑️ Purged bytecode cache: {pyc_path}")
|
||||
except Exception as e:
|
||||
# Non-fatal — just warn and continue
|
||||
print(f"⚠️ Could not purge bytecode cache: {e}")
|
||||
|
||||
|
||||
def load_module(name: str, filepath: str):
|
||||
"""
|
||||
Dynamically load a .py file as a module.
|
||||
|
||||
FIX 1: Purge the .pyc cache so edits are always reflected.
|
||||
FIX 2: Evict any previously loaded version from sys.modules
|
||||
to prevent Python reusing a stale module object across
|
||||
multiple calls (e.g. when running in a REPL or test loop).
|
||||
"""
|
||||
# FIX 1: delete stale bytecode
|
||||
purge_bytecode_cache(filepath)
|
||||
|
||||
# FIX 2: evict from module registry
|
||||
if name in sys.modules:
|
||||
del sys.modules[name]
|
||||
|
||||
spec = importlib.util.spec_from_file_location(name, filepath)
|
||||
if spec is None or spec.loader is None:
|
||||
raise FileNotFoundError(f"Cannot load module spec for: {filepath}")
|
||||
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
sys.modules[name] = module # register before exec (handles self-refs)
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# HELPERS
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
def sorted_pages(chapter_dir: Path):
|
||||
"""Return all image files in chapter_dir sorted by filename stem."""
|
||||
exts = {".jpg", ".jpeg", ".png", ".webp"}
|
||||
pages = [
|
||||
p for p in chapter_dir.iterdir()
|
||||
if p.is_file() and p.suffix.lower() in exts
|
||||
]
|
||||
return sorted(pages, key=lambda p: p.stem)
|
||||
|
||||
|
||||
def make_page_workdir(chapter_dir: Path, page_stem: str) -> Path:
|
||||
"""Create and return translated/<page_stem>/ inside chapter_dir."""
|
||||
workdir = chapter_dir / "translated" / page_stem
|
||||
workdir.mkdir(parents=True, exist_ok=True)
|
||||
return workdir
|
||||
|
||||
|
||||
def verify_translator_api(module) -> bool:
|
||||
"""
|
||||
Checks that the loaded module exposes process_manga_page() and
|
||||
that it accepts all keys defined in PIPELINE_CONFIG.
|
||||
Prints a clear warning for any missing parameter.
|
||||
"""
|
||||
import inspect
|
||||
|
||||
fn = getattr(module, "process_manga_page", None)
|
||||
if fn is None:
|
||||
print("❌ manga-translator.py does not expose process_manga_page()")
|
||||
return False
|
||||
|
||||
sig = inspect.signature(fn)
|
||||
params = set(sig.parameters.keys())
|
||||
ok = True
|
||||
|
||||
for key in PIPELINE_CONFIG:
|
||||
if key not in params:
|
||||
print(
|
||||
f"⚠️ PIPELINE_CONFIG key '{key}' not found in "
|
||||
f"process_manga_page() — update pipeline or translator."
|
||||
)
|
||||
ok = False
|
||||
|
||||
return ok
|
||||
|
||||
|
||||
def sanity_check_fixes(module_path: Path) -> None:
|
||||
"""
|
||||
Grep the translator source for key fix signatures and warn if
|
||||
any are missing. Helps catch cases where an edit was not saved.
|
||||
"""
|
||||
checks = {
|
||||
"Fix A (gap_factor=4.0)": "gap_factor=4.0",
|
||||
"Fix B (_majority_contour_id)": "_majority_contour_id",
|
||||
"Fix C (median_inter adaptive gap)": "median_inter",
|
||||
"Fix D (merge_same_column_dialogue)": "merge_same_column_dialogue_boxes",
|
||||
"Fix E (lang_code from self.langs)": "lang_code = self.langs",
|
||||
}
|
||||
|
||||
print("\n🔎 Sanity-checking fixes in manga-translator.py:")
|
||||
source = module_path.read_text(encoding="utf-8")
|
||||
all_ok = True
|
||||
|
||||
for label, token in checks.items():
|
||||
found = token in source
|
||||
status = "✅" if found else "❌ MISSING"
|
||||
print(f" {status} {label}")
|
||||
if not found:
|
||||
all_ok = False
|
||||
|
||||
if not all_ok:
|
||||
print(
|
||||
"\n⚠️ One or more fixes are missing from manga-translator.py.\n"
|
||||
" Save the file and re-run. Aborting.\n"
|
||||
)
|
||||
sys.exit(1)
|
||||
else:
|
||||
print(" All fixes present.\n")
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# PER-PAGE PIPELINE
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
def process_page(page_path: Path, workdir: Path, translator_module) -> bool:
|
||||
print(f"\n{'─' * 70}")
|
||||
print(f" PAGE : {page_path.name}")
|
||||
print(f" OUT : {workdir}")
|
||||
print(f"{'─' * 70}")
|
||||
|
||||
orig_dir = os.getcwd()
|
||||
try:
|
||||
os.chdir(workdir)
|
||||
|
||||
# Use absolute paths so output always lands in workdir
|
||||
# regardless of any internal os.getcwd() calls.
|
||||
output_json = str(workdir / "bubbles.json")
|
||||
output_txt = str(workdir / "output.txt")
|
||||
debug_path = str(workdir / "debug_clusters.png")
|
||||
|
||||
print(" ⏳ Extracting text and translating...")
|
||||
|
||||
results = translator_module.process_manga_page(
|
||||
image_path = str(page_path.resolve()),
|
||||
output_json = output_json,
|
||||
output_txt = output_txt,
|
||||
**PIPELINE_CONFIG,
|
||||
)
|
||||
|
||||
# ── Debug visualisation ───────────────────────────────
|
||||
# FIX: process_manga_page() already writes debug_clusters.png
|
||||
# internally with full OCR quad data.
|
||||
# We do NOT call draw_debug_clusters() here with ocr=[]
|
||||
# because that would OVERWRITE the correct debug image with
|
||||
# a degraded version that has no quad outlines.
|
||||
#
|
||||
# If process_manga_page() did not write a debug image
|
||||
# (e.g. older version), we do a minimal fallback draw.
|
||||
if results and not os.path.exists(debug_path):
|
||||
try:
|
||||
import cv2
|
||||
image_bgr = cv2.imread(str(page_path.resolve()))
|
||||
if image_bgr is not None:
|
||||
vis_boxes: dict = {}
|
||||
vis_lines: dict = {}
|
||||
vis_indices: dict = {}
|
||||
|
||||
for bid_str, data in results.items():
|
||||
bid = int(bid_str)
|
||||
xywh = data["box"]
|
||||
vis_boxes[bid] = (
|
||||
xywh["x"],
|
||||
xywh["y"],
|
||||
xywh["x"] + xywh["w"],
|
||||
xywh["y"] + xywh["h"],
|
||||
)
|
||||
vis_lines[bid] = data.get("lines", [])
|
||||
vis_indices[bid] = []
|
||||
|
||||
# Fallback only — ocr=[] means no quad outlines
|
||||
translator_module.draw_debug_clusters(
|
||||
image_bgr = image_bgr,
|
||||
out_boxes = vis_boxes,
|
||||
out_lines = vis_lines,
|
||||
out_indices = vis_indices,
|
||||
ocr = [],
|
||||
save_path = debug_path,
|
||||
)
|
||||
print(f" 🖼️ Fallback debug image written → {debug_path}")
|
||||
except Exception as e:
|
||||
print(f" ⚠️ Debug visualisation failed (non-fatal): {e}")
|
||||
|
||||
# ── Sanity-check output files ─────────────────────────
|
||||
all_good = True
|
||||
for fname in ("output.txt", "bubbles.json"):
|
||||
fpath = workdir / fname
|
||||
if not fpath.exists():
|
||||
print(f" ⚠️ {fname} was NOT created.")
|
||||
all_good = False
|
||||
elif fpath.stat().st_size == 0:
|
||||
print(f" ⚠️ {fname} exists but is EMPTY.")
|
||||
all_good = False
|
||||
else:
|
||||
print(f" 📄 {fname} → {fpath.stat().st_size} bytes")
|
||||
|
||||
if not results:
|
||||
print(" ⚠️ process_manga_page() returned no results.")
|
||||
return False
|
||||
|
||||
print(f" ✅ Done — {len(results)} box(es) processed.")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f" ❌ Failed: {e}")
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
finally:
|
||||
os.chdir(orig_dir)
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
# MAIN
|
||||
# ─────────────────────────────────────────────────────────────
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Manga Translation OCR Batch Pipeline",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python pipeline-translator.py pages-for-tests
|
||||
python pipeline-translator.py pages-for-tests --start 2 --end 4
|
||||
python pipeline-translator.py pages-for-tests --source en --target es
|
||||
"""
|
||||
)
|
||||
parser.add_argument(
|
||||
"chapter_dir",
|
||||
help="Path to the folder containing manga page images"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--start", type=int, default=1,
|
||||
help="Start from this page number (1-based, default: 1)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--end", type=int, default=None,
|
||||
help="Stop after this page number inclusive (default: all)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--source", "-s", default=None,
|
||||
help=f"Override source language (default: {PIPELINE_CONFIG['source_lang']})"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--target", "-t", default=None,
|
||||
help=f"Override target language (default: {PIPELINE_CONFIG['target_lang']})"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-sanity", action="store_true",
|
||||
help="Skip the fix sanity check (not recommended)"
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# ── Apply CLI language overrides ─────────────────────────
|
||||
config = dict(PIPELINE_CONFIG)
|
||||
if args.source:
|
||||
config["source_lang"] = args.source
|
||||
if args.target:
|
||||
config["target_lang"] = args.target
|
||||
PIPELINE_CONFIG.update(config)
|
||||
|
||||
# ── Resolve chapter directory ─────────────────────────────
|
||||
chapter_dir = Path(args.chapter_dir).resolve()
|
||||
if not chapter_dir.is_dir():
|
||||
print(f"❌ Not a directory: {chapter_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
# ── Locate manga-translator.py ────────────────────────────
|
||||
script_dir = Path(__file__).parent
|
||||
module_path = script_dir / "manga-translator.py"
|
||||
|
||||
if not module_path.exists():
|
||||
print(f"❌ manga-translator.py not found in {script_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
# ── Sanity-check that all fixes are present ───────────────
|
||||
if not args.skip_sanity:
|
||||
sanity_check_fixes(module_path)
|
||||
|
||||
# ── Load translator module ────────────────────────────────
|
||||
print(f"📦 Loading translator from: {module_path}")
|
||||
try:
|
||||
translator = load_module("manga_translator", str(module_path))
|
||||
except Exception as e:
|
||||
print(f"❌ Could not load manga-translator.py: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# ── API compatibility check ───────────────────────────────
|
||||
if not verify_translator_api(translator):
|
||||
print("❌ Aborting — fix the parameter mismatch above first.")
|
||||
sys.exit(1)
|
||||
|
||||
# ── Discover and slice pages ──────────────────────────────
|
||||
all_pages = sorted_pages(chapter_dir)
|
||||
if not all_pages:
|
||||
print(f"❌ No image files found in: {chapter_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
start_idx = max(0, args.start - 1)
|
||||
end_idx = args.end if args.end is not None else len(all_pages)
|
||||
pages = all_pages[start_idx:end_idx]
|
||||
|
||||
if not pages:
|
||||
print(f"❌ No pages in range [{args.start}, {args.end}]")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"\n📚 Chapter : {chapter_dir.name}")
|
||||
print(f" Pages : {len(pages)} of {len(all_pages)} total")
|
||||
print(f" Source : {PIPELINE_CONFIG['source_lang']}")
|
||||
print(f" Target : {PIPELINE_CONFIG['target_lang']}")
|
||||
print(f" Output : {chapter_dir / 'translated'}\n")
|
||||
|
||||
# ── Process each page ─────────────────────────────────────
|
||||
results_summary = []
|
||||
|
||||
for page_num, page_path in enumerate(pages, start=start_idx + 1):
|
||||
workdir = make_page_workdir(chapter_dir, page_path.stem)
|
||||
success = process_page(page_path, workdir, translator)
|
||||
results_summary.append((page_num, page_path.name, success))
|
||||
|
||||
# ── Final summary ─────────────────────────────────────────
|
||||
print(f"\n{'═' * 70}")
|
||||
print(f" BATCH COMPLETE")
|
||||
print(f"{'═' * 70}")
|
||||
|
||||
passed = sum(1 for _, _, ok in results_summary if ok)
|
||||
failed = len(results_summary) - passed
|
||||
|
||||
for page_num, name, ok in results_summary:
|
||||
status = "✅" if ok else "❌"
|
||||
print(f" {status} [{page_num:>3}] {name}")
|
||||
|
||||
print(f"\n Total: {passed} succeeded, {failed} failed")
|
||||
print(f"{'═' * 70}\n")
|
||||
|
||||
if failed:
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user