π LEHJA β Urdu voice-model training (live)
Updated: 2026-06-12 23:55:01 UTC Β· cron har 5 min Β· page 60s auto-refresh
OVERALL PIPELINE: 55%
GPU: 0% busy Β· jobs: 9 Β·
VRAM: 29.1 GB used /
63.9 GB free / 93.6 GB total
π Tasks (main + sub)
β
SSH access + box verification β complete (2 min lagi)
β
Python deps + CosyVoice repo install β complete (6 min lagi)
β
CosyVoice2-0.5B model download (5.3 GB) β complete (5 min lagi)
β
Whisper CUDA-12 libs fix (silent-fail pakra gaya tha) β complete (14 min lagi)
2. Data Acquisition β
100% β
Catalogued Urdu files VM101βH100 (2,667 files) β complete (3 min lagi)
β
Full pool transfer (93,166 recordings, 17 GB) β complete (9 min lagi)
β
Catalog language-map (45,185 entries) β complete (1 min lagi)
β
Public FLEURS Urdu+English β 3,637 clips β complete (26 min lagi)
β CommonVoice via HF (gated tha β MDC se mil gaya) β skipped
2b. Common Voice 80h β whisper verification π 11% π 64,475 validated clips ka audioβtext cross-check (8 GPU workers) β 11% Β· 7 min se chal raha Β· ~57 min baqi Β· 7,000/64,465 clips checked
3. In-domain Urdu Mining (asal awaazein) β
100% β
Known-Urdu transcription β 2,667 files, 6 GPU workers β complete (3 min lagi) Β· 2,668/2,668 files Β· 17,605 clips Β· 1620.3 min saaf Urdu
β
Language-scan β 53,054 unknown files, 16 GPU workers β complete (18 min lagi) Β· 53,054/53,054 scanned Β· naye Urdu-candidates: 3,219
β
Round-2: naye mile files ki transcription β complete (14 min lagi) Β· 3,219/3,219
4. Training-Data Prep β¬ 0% β¬ Manifests merge + dedup + train/dev split β pending
β¬ wav.scp / text / utt2spk / spk2utt β pending
β¬ Speaker embeddings (campplus) β pending
β¬ Speech tokens extraction β pending
β¬ Parquet packaging β pending
5. CosyVoice2 URDU TRAINING β¬ 0% β¬ LLM module fine-tune (Urdu phonetics) β pending Β· ~10 epochs planned
β¬ Checkpoint averaging (best-5) β pending
β¬ Inference smoke-test (Urdu bolta hai?) β pending
β¬ Urdu test-set synthesis (50 jumlay) β pending
β¬ Whisper-readback intelligibility gate β pending
β¬ Zero-shot voice-clone spot test (apki awaaz) β pending
β¬ Sab artifacts app-server par download (model+data+logs) β pending
β¬ COMPLETION EMAIL β info@ifcondition.com β AAP H100 BAND KAR DEN β pending
π Harvest
Saaf Urdu (training-ready): 1620.3 min in-domain (17,605 clips) + 3,637 public clips
Scan se naye Urdu-candidate files: 3,219 / 53,054 scanned Β· worker failures: 0
Kaam jari hai β mukammal hone par email: info@ifcondition.com