πŸŽ™ LEHJA β€” Urdu voice-model training (live)

Updated: 2026-06-12 23:55:01 UTC Β· cron har 5 min Β· page 60s auto-refresh

OVERALL PIPELINE: 55%
GPU: 0% busy Β· jobs: 9 Β· VRAM: 29.1 GB used / 63.9 GB free / 93.6 GB total

πŸ“‹ Tasks (main + sub)

1. H100 Setup βœ… 100%
   βœ… SSH access + box verification β€” complete (2 min lagi)
   βœ… Python deps + CosyVoice repo install β€” complete (6 min lagi)
   βœ… CosyVoice2-0.5B model download (5.3 GB) β€” complete (5 min lagi)
   βœ… Whisper CUDA-12 libs fix (silent-fail pakra gaya tha) β€” complete (14 min lagi)
2. Data Acquisition βœ… 100%
   βœ… Catalogued Urdu files VM101β†’H100 (2,667 files) β€” complete (3 min lagi)
   βœ… Full pool transfer (93,166 recordings, 17 GB) β€” complete (9 min lagi)
   βœ… Catalog language-map (45,185 entries) β€” complete (1 min lagi)
   βœ… Public FLEURS Urdu+English β€” 3,637 clips β€” complete (26 min lagi)
   βŠ˜ CommonVoice via HF (gated tha β€” MDC se mil gaya) β€” skipped
2b. Common Voice 80h β€” whisper verification πŸ”„ 11%
   πŸ”„ 64,475 validated clips ka audio↔text cross-check (8 GPU workers) β€” 11% Β· 7 min se chal raha Β· ~57 min baqi Β· 7,000/64,465 clips checked
3. In-domain Urdu Mining (asal awaazein) βœ… 100%
   βœ… Known-Urdu transcription β€” 2,667 files, 6 GPU workers β€” complete (3 min lagi) Β· 2,668/2,668 files Β· 17,605 clips Β· 1620.3 min saaf Urdu
   βœ… Language-scan β€” 53,054 unknown files, 16 GPU workers β€” complete (18 min lagi) Β· 53,054/53,054 scanned Β· naye Urdu-candidates: 3,219
   βœ… Round-2: naye mile files ki transcription β€” complete (14 min lagi) Β· 3,219/3,219
4. Training-Data Prep ⬜ 0%
   β¬œ Manifests merge + dedup + train/dev split β€” pending
   β¬œ wav.scp / text / utt2spk / spk2utt β€” pending
   β¬œ Speaker embeddings (campplus) β€” pending
   β¬œ Speech tokens extraction β€” pending
   β¬œ Parquet packaging β€” pending
5. CosyVoice2 URDU TRAINING ⬜ 0%
   β¬œ LLM module fine-tune (Urdu phonetics) β€” pending Β· ~10 epochs planned
   β¬œ Checkpoint averaging (best-5) β€” pending
   β¬œ Inference smoke-test (Urdu bolta hai?) β€” pending
6. Validation ⬜ 0%
   β¬œ Urdu test-set synthesis (50 jumlay) β€” pending
   β¬œ Whisper-readback intelligibility gate β€” pending
   β¬œ Zero-shot voice-clone spot test (apki awaaz) β€” pending
7. Delivery ⬜ 0%
   β¬œ Sab artifacts app-server par download (model+data+logs) β€” pending
   β¬œ COMPLETION EMAIL β†’ info@ifcondition.com β†’ AAP H100 BAND KAR DEN β€” pending

πŸ“Š Harvest

Saaf Urdu (training-ready): 1620.3 min in-domain (17,605 clips) + 3,637 public clips
Scan se naye Urdu-candidate files: 3,219 / 53,054 scanned Β· worker failures: 0

Kaam jari hai β€” mukammal hone par email: info@ifcondition.com