GenerativeAI July Meetup

GenerativeAI July Meetup

Meet and Greet Humans Talking about Generation using AI

In the last 5 months, Generative AI Community has organised:

  1. Demo Day Meetup in February, a Deep Coffee Hack,
  2. A Hackathon with 5L INR in cash prizes, 43 submissions from almost 100 participants!
  3. April meetup which saw a stellar demo on a chatbot for farmers, evaluating question answering from a Llama Index contributor and an Intro to Model Quantisation from Amod Malviya
  4. May meetup that saw talks on unet architecture by Vignesh, CTO hexo.ai and composability with stable diffusion by Amogh Vaishampayam, Founding Team @ dashtoon.ai
  5. June Meetup which saw a discussion with Amod Malviya, Kailash Nadh and Charu Tak.

When? 29th July, Saturday

Speakers

Dr. Pratyush

Pratyush is a researcher at Microsoft Research and AI4Bharat (IIT M) with a focus on systems and deep learning for language technologies. He is deeply interested in realising AI as a force of social good.

Talk Theme: Brief story of AI4Bharat - the mission and what we have achieved. Will then move on to LLMs, provide some technical insight into intrinsic dimensionality in deep learning, and his take on how innovation will develop in building custom LLMs

Sachin Dharavshivkar

Sachin Dharashivkar will speak about LLM Finetuning and RLHF
Sachin is a founder who is exploring use cases of AI agents. He enjoys training Reinforcement Learning agents and exploring novel applications of Large Language Models.

Talk Theme: Introduction to Supervised and Reinforcement Finetuning.

Three steps of training chatGPT style models. How to perform supervised finetuning. Why is Reinforcement Learning from Human Feedback important and How to train Reward and Policy models.

Location: Bengaluru. Exact location is shared in the invite on approval.

Hosted by

📚 Resources Past Discussion Summaries Past discussion summaries Event announcements Please fill out this form: Generative AI Events. All events are published here: Generative AI Events Hiring and Jobs more

Supported by

Venue host

With a mission to celebrate and reward credit-worthy individuals, CRED is a transparent and fully digital platform for highly trusted individuals, brands, and institutions. CRED, with its empathetic approach to design, makes financial decisions visible, delightful, and rewarding for its members, fa… more

Akshat Gupta

@akshatg

Harmonising Art and AI: Crafting Jazzy and Juicy Video Snippets through AI

Submitted Jul 20, 2023

Abstract

In recent times, Live Streaming platforms are gaining popularity where live content is being shown to users. Typically, the videos created by the creators range from 15 minutes to an hour. After intensive research, it was found that a sizable chunk of users drops within first 30 seconds of the video. Another piece of research shows that, on average, a user only has an attention span of 30 seconds. And this number is even lower in Gen Z, which is our main target audience. To solve this problem, we would want to identify the juiciest segments from videos as well as add external features that would prompt a user to land on the base video, overall increasing user engagement and jazziness of the video. Also, we want to make a customizable framework that would cater not only to snippets but also trailers, mashups, etc.

Literature Review

To solve this problem, we did extensive research on the tools that already exist on the market to solve it. When searched globally, there is no single tool or solution that aims to solve this. There are several solutions that try to tackle this in bits and pieces, but not fully. We then read some research papers on how we can do this end-to-end, and from here we got a couple of ideas to try.
Approach
We broke our solution into two parts: how to get the base snippet (the juiciest part within the videos) and what are the different post-processing techniques that we can apply to it. To summarise our solution,

Base snippet:

  1. A transcription-based approach to finding speech-to-text (SOTA)
  2. We optimised this model using ctranslate for faster inference.
  3. Used Flan T5 XXL to generate a summary of the sentences.
  4. Used simple transformer-based models to calculate sentence similarity between sentences and a summary.
  5. Used a moving average on the cosine scores to generate the best timestamp for the summary.

Post processing:

  1. Key moments in the video (We used CLIP-based models to identify them based on a prompt and user interactions)
  2. Used frame-level analysis (phasing) to determine shot detection (where sudden changes happen)
  3. Used stickers and gifs (based on context from the Flan model)
  4. Created an in-house solution for memes (using Stable diffusion)
  5. A stable diffusion-based model for artistic video generation
  6. Used ESRGANs to upsample videos to increase quality.

Impact and Future Work

We deployed our solution at scale (500 video snippets per day) in India. We saw a staggering increase of close to 80% in overall time spent and user engagements. As next steps, we are planning to scale this solution to Indonesia and then to the US. We are also aiming to create a new feed just for these videos. We will also be focusing on further improvements, both in base snippets and post-processing.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

📚 Resources Past Discussion Summaries Past discussion summaries Event announcements Please fill out this form: Generative AI Events. All events are published here: Generative AI Events Hiring and Jobs more

Supported by

Venue host

With a mission to celebrate and reward credit-worthy individuals, CRED is a transparent and fully digital platform for highly trusted individuals, brands, and institutions. CRED, with its empathetic approach to design, makes financial decisions visible, delightful, and rewarding for its members, fa… more