Himanshu Aggarwal

From Script to Screen at Scale: Engineering an AI Short Video Generation Pipeline

Submitted Jun 22, 2026

Generating thousands of polished short clips from long-form video — automatically, across multiple content genres — is a different problem from what most AI video demos show you. This talk walks through a production pipeline that does exactly that: automated clipping with LLM-based segment selection, an Intelligent Reframing Engine that detects live speakers vs. static faces using mouth movement, head motion, and emotion signals, and a final aesthetics layer that handles branded overlays and captions.

The focus is on what breaks. We’ll cover six production failure modes — clip-reframe mismatches, liveness false positives, STT hallucinations, genre config drift — and the mitigations that actually worked. The genre config pattern that drives the entire pipeline without branching logic is transferable to any multi-variant AI system.

For ML and data engineers building or evaluating AI content generation systems.

Speaker bio:
Himanshu Aggarwal is a Machine Learning Engineer at Glance, where he builds large-scale AI systems for content discovery and personalization, serving over 250 million users globally. His expertise spans recommender systems, semantic retrieval, knowledge graphs, and large language models, with a strong focus on designing scalable, production-grade architectures.

With experience across research and high-scale consumer platforms, Himanshu works on advancing content understanding and building intelligent systems that enhance how users discover and engage with digital experiences across domains.

Link to PPT (work ongoing): https://drive.google.com/file/d/1YSR__8PKY-r3HqHmLSG7_UA4AQURQce0/view?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures