Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Himanshu Aggarwal
Submitted Jun 22, 2026
Generating thousands of polished short clips from long-form video — automatically, across multiple content genres — is a different problem from what most AI video demos show you. This talk walks through a production pipeline that does exactly that: automated clipping with LLM-based segment selection, an Intelligent Reframing Engine that detects live speakers vs. static faces using mouth movement, head motion, and emotion signals, and a final aesthetics layer that handles branded overlays and captions.
The focus is on what breaks. We’ll cover six production failure modes — clip-reframe mismatches, liveness false positives, STT hallucinations, genre config drift — and the mitigations that actually worked. The genre config pattern that drives the entire pipeline without branching logic is transferable to any multi-variant AI system.
For ML and data engineers building or evaluating AI content generation systems.
Speaker bio:
Himanshu Aggarwal is a Machine Learning Engineer at Glance, where he builds large-scale AI systems for content discovery and personalization, serving over 250 million users globally. His expertise spans recommender systems, semantic retrieval, knowledge graphs, and large language models, with a strong focus on designing scalable, production-grade architectures.
With experience across research and high-scale consumer platforms, Himanshu works on advancing content understanding and building intelligent systems that enhance how users discover and engage with digital experiences across domains.
Link to PPT (work ongoing): https://drive.google.com/file/d/1YSR__8PKY-r3HqHmLSG7_UA4AQURQce0/view?usp=sharing
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}