The Fifth Elephant 2023 Winter edition will cover topics on the research, engineering, and business aspects of AI, exploring the practical implementation and economic implications of these systems.
In 2020, OpenAI released a Large Language Model (LLM) called GPT3 which has a billion parameters. With a minimal and intuitive user interface which was released to go with GPT3, it caught the imagination and attention of AI communities and researchers all over the world.
One by one, the domain use cases such as co-pilots for coding, creative AI, and other downstream tasks were shown to be fast-tracked by GenerativeAI models and LLMs. As such, there is a wide-ranging interest in large language models and applications around them for various domains and use cases in the AI space. Experiments which aim to find optimal hyperparameters, and those dealing with underfitting and overfitting models are being carried out regularly; more and more barriers are being broken down every day.
The winter edition of The Fifth Elephant will showcase talks, discussions and demos across generative and multimodal AI, and other classic AI/ML/DL applications on the below themes.
Share approaches and case studies covering the following use cases:
- Products and platforms using LLMs, GenerativeAI, ML, and Deep Learning techniques, and business formulation around AI engineering.
- Conversational AI and search, automatic speech recognition, healthcare, e-commerce, fintech, media and OTT, and other verticals.
- Multilingual needs in India in digital products/platforms - features discussions, models training, finetuning, RLHF, RAGs, quantization techniques, dataset curation and augmentations, challenges faced in pipelines, evaluation metrics, future roadmaps, applications such as multilingual voice bots using ASR/STT, text to speech for accessibility.
Share case studies and experiential talks on handling the operations for data science such as scaling challenges and fine-tuning challenges, and lessons learned, and best practices for incorporating ethics, safety, and bias.
Show demos on features/products which leverage AI and LLM-based APIs and models. It can be from creative AI, generative AI space, and various verticals with relevant use cases.
The December edition will be held in-person. Attendance is open to The Fifth Elephant members only. Pick a membership to attend the in-person conference, and to support The Fifth Elephant’s community activities.
- AI/ML/Data Science Ops engineers who want to learn about state-of-the-art tools and techniques, especially from domains such as health care, e-commerce, automobile, agri-tech and industrial verticals
- Data scientists who want a deeper understanding of model deployment/governance.
- Architects who are building ML workflows that scale.
- Tech founders and CTOs who are building products and platforms that leverage AI, ML and LLMs
- Product managers, who want to learn about the process of building AI/ML products.
- Directors, VPs and senior tech leadership who are building AI/ML teams.
Sponsorship slots are open for:
- Infrastructure (GPU, CPU and cloud providers) and developer productivity tool makers who want to evangelise their offering to developers and decision-makers.
- Companies who want to do tech branding among AI and ML developers.
- Venture Capital (VC) firms and investors who want to scan the landscape of innovations and innovators in AI and who want to source leads for investment in the AI and ML space.
If you are interested in sponsoring The Fifth Elephant, email firstname.lastname@example.org.
Video Highlights Generation
Roposo is a live video platform with over ~200 million end users with ~1000 live videos getting uploaded every day, each lasting 15 minutes to 3 hours. In order to increase engagement and improve user experience, we are trying to create a central video feed which will have assets that can easily consumed. This requires converting our events and creator led videos to shorter formats like trailers and short clips, for which, we process the videos with help of AI to get the most important segments.
Videos can be very diverse as their content can vary from:
- people having arguments, dancing or singing (Big Boss, Glance being smart lock screen partner)
- just having conversations like in an interview (Creator Led Shows, Exclusive content for Roposo)
- a fashion show where a supermodel just walk a runway (Lakme Fashion Week, Glance being a partner).
As a solution to this, we bifurcated videos based on the density of speech happening in them and created separate solutions for a speech-heavy and a visual-heavy video.
For a speech-heavy video, we are use transcription to select the most important segments of a video while for a visual-heavy video, we break videos into shots and generate visual descriptions of the shots to select the most important segments.
We are leveraging the following for our use-case:
- Faster Whisper using CTranslate2 for audio transcription.
- BLIP and Git for Image Captioning.
- Vilt for Visual Question Answering.
- Color Histograms for Shot boundary detection.
- gpt3.5-turbo for text highlights and summarisation.
- Sentence-BERT embedding and cosine similarity for retrival.
- All the above models optimised to run on a single T4 GPU using a custom dataloader for parallel processing.
To enhance the viewer experience, we are post-processing our short videos with AI-generated music, custom transitions between shots, animations, stickers, subtitles and a lot more.
The end-to-end processing runs at 5-10 mins for 30 min long video.
- Increased content liquidity on our platform by 300%.
- Increased average play duration (APD) on short videos by 44%.
- Increased viewership for original content by 23%.
- Introduction of multi-modality for describing segments.
- Generalization across on more diverse videos.
Data Scientists and ML Engineers