The Fifth Elephant

The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Bangalore International Centre, Bangalore

All submissions

Previous Next

The Multimodal Revolution: Reshaping Video Analysis Pipelines

Submitted May 23, 2024

Session type: 30 mins talk

Multimodal AI is revolutionizing video analysis, but practical insights on pipeline design are scarce. Traditional computer vision pipelines often involve a complex web of specialized models. This leads to high costs, maintenance burdens, and difficulty in adapting to new tasks. This talk will dissect a real-world case study where multimodal models dramatically simplified a large-scale video analysis pipeline, leading to significantly reduced costs and improved agility.

Outline

I want to illustrate in this talk the breakthroughs we were able to unlock with the help of Multimodal models. I will demonstrate how we use simple, often older models like CLIP to create a simplified pipeline that replaces complex, interdependent vision pipelines. We also show the metrics that we use to decide when to use more expensive models like GPT-4-Vision and LLaVA to ensure cost-efficient processing of each video file. Attendees will gain real-world knowledge based on our experiences refactoring a large-scale system, avoiding potential pitfalls.

Impact

I work for a large media organization where we need to perform multiple simple tasks with our content library. We operate a generic vision pipeline to which several specific components can be added/ subtracted to perform various downstream tasks - as diverse as trailer generation, thumbnail selection to content moderation. The use of Multimodal models significantly simplified our pipelines and also made them easy to extend to to additional use cases.

All submissions

Previous Next

Comments

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Google

Together, we can build for everyone.

Workshop sponsor

Datastax

Datastax, the real-time AI Company.

Lanyard Sponsor

Uber

We reimagine the way the world moves for the better.

Sponsor

Monster API

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United Foundation

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.