The Fifth Elephant 2025 Annual Conference (18 & 19 July)
Less hype. More engineering.
Jul 2025
7 Mon
8 Tue
9 Wed
10 Thu
11 Fri
12 Sat
13 Sun
Jul 2025
14 Mon
15 Tue
16 Wed
17 Thu
18 Fri
19 Sat 08:45 AM – 05:50 PM IST
20 Sun
Prakash Vanapalli
@prakashjay
Submitted May 29, 2025
We present a production-ready lip-sync model serving millions of creators, built on a novel Latent-GAN architecture that achieves superior identity preservation and audio-visual alignment compared to other approaches.
Our system is trained on 10,000+ hours of diverse audio-visual data using custom preprocessing pipelines which include audio diarization, vocal separation, and AV synchronization etc. We demonstrate how GAN architectures with transformer attention mechanisms and VAEs can match diffusion model quality while offering faster inference speeds.
Key technical contributions include:
Target audience: ML engineers, AI researchers, and developers building content creation tools, video processing systems, or scaling AI models for consumer applications.
Hosted by
Supported by
Gold Sponsor
Gold Sponsor
Bronze Sponsor
Community partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}