๐ LEHJA โ Urdu voice-model training (live)
Updated: 2026-06-12 19:55:01 UTC ยท cron har 5 min ยท page 60s auto-refresh
OVERALL PIPELINE: 49%
GPU: 100% busy ยท jobs: 7 ยท
VRAM: 24.7 GB used /
68.4 GB free / 93.6 GB total
๐ Tasks (main + sub)
โ
SSH access + box verification โ complete (2 min lagi)
โ
Python deps + CosyVoice repo install โ complete (6 min lagi)
โ
CosyVoice2-0.5B model download (5.3 GB) โ complete (5 min lagi)
โ
Whisper CUDA-12 libs fix (silent-fail pakra gaya tha) โ complete (14 min lagi)
2. Data Acquisition โ
100% โ
Catalogued Urdu files VM101โH100 (2,667 files) โ complete (3 min lagi)
โ
Full pool transfer (93,166 recordings, 17 GB) โ complete (9 min lagi)
โ
Catalog language-map (45,185 entries) โ complete (1 min lagi)
โ
Public FLEURS Urdu+English โ 3,637 clips โ complete (26 min lagi)
โ CommonVoice Urdu (HF-gated, zaroorat nahi) โ skipped
3. In-domain Urdu Mining (asal awaazein) ๐ 84% ๐ Known-Urdu transcription โ 2,667 files, 6 GPU workers โ 19% ยท 36 min se chal raha ยท ~2.5 h baqi ยท 500/2,668 files ยท 17,459 clips ยท 966.6 min saaf Urdu
โ
Language-scan โ 53,054 unknown files, 16 GPU workers โ complete (18 min lagi) ยท 53,054/53,054 scanned ยท naye Urdu-candidates: 3,219
โ
Round-2: naye mile files ki transcription โ complete (14 min lagi) ยท 3,219/3,219
4. Training-Data Prep โฌ 0% โฌ Manifests merge + dedup + train/dev split โ pending
โฌ wav.scp / text / utt2spk / spk2utt โ pending
โฌ Speaker embeddings (campplus) โ pending
โฌ Speech tokens extraction โ pending
โฌ Parquet packaging โ pending
5. CosyVoice2 URDU TRAINING โฌ 0% โฌ LLM module fine-tune (Urdu phonetics) โ pending ยท ~10 epochs planned
โฌ Checkpoint averaging (best-5) โ pending
โฌ Inference smoke-test (Urdu bolta hai?) โ pending
โฌ Urdu test-set synthesis (50 jumlay) โ pending
โฌ Whisper-readback intelligibility gate โ pending
โฌ Zero-shot voice-clone spot test (apki awaaz) โ pending
โฌ Sab artifacts app-server par download (model+data+logs) โ pending
โฌ COMPLETION EMAIL โ info@ifcondition.com โ AAP H100 BAND KAR DEN โ pending
๐ Harvest
Saaf Urdu (training-ready): 966.6 min in-domain (17,459 clips) + 3,637 public clips
Scan se naye Urdu-candidate files: 3,219 / 53,054 scanned ยท worker failures: 0
Kaam jari hai โ mukammal hone par email: info@ifcondition.com