The Fifth Elephant 2024 Annual Conference (12th &13th July)
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Accepting submissions
Not accepting submissions
We appreciate that many participants create submissions out of a genuine desire to share knowledge with our community, help solve common problems or to contribute in a meaningful way. However, we find that they sometimes fall short of achieving this objective because the written submissions fail to capture the attention of the community or meet acceptance through The Fifth Elephant’s peer review process. More often than not this is because the content of the submission does not explain what they intend with sufficient clarity or detail.
The template (and example) is an attempt to help you write a better submission, one that is noticed and understood by your intended audience and not lost in the crowd of interesting proposals we receive. Please use this template as a guideline, while ensuring that it is in your own unique and authentic voice.
BEFORE you begin writing your submission, please give some thought to the following:
The most successful talks and sessions are those where presenters are able to abstract an actionable insight from a common pain area, enlighten the audience about something new, provide a fresh perspective, and/or demonstrate innovation.
Here’s a guide for speakers to draft their presentations.
You can view talks held at previous editions of The Fifth Elephant 2024 for reference:
The call for submissions will be close on 3 June 2024. Talks will be selected on a rolling basis as submissions are made.
You can submit a session for:
GraphRAG: Powering Up LLMs with Knowledge GraphsIn the era of big data, large language models (LLMs) are becoming increasingly important for tasks like question answering, document analysis, and chatbot development. However, traditional LLMs can often struggle with factual accuracy, reasoning, and handling complex information. more
Session type: 30 mins talk
|
Scaling Customer Delight at Zomato using AIIntroduction In today’s rapidly evolving digital landscape, Generative AI is playing a pivotal role in transforming how our businesses interact with the customers. more
Session type: 30 mins talk
|
Chat with Tables: Query tabular data in English using self-hosted Large Language ModelsBusiness users and non-technical professionals often need to quickly analyse or transform tabular data in spreadsheets for ad hoc business intelligence. However, they might lack the necessary programming knowledge to do so themselves and therefore must reach out to a data analyst. Such unexpected delays have the potential to incur huge opportunity costs for time-sensitive business decisions which… more
Session type: Workshop
|
A privacy preserving DPI to unfreeze data markets - to solve our data woes!Clearly, there is a race to build larger and larger AI models these days trained on as much training data as possible. Indian developers are also trying to make their presence felt in this race to build home-grown models for our unique problems and situations. Owing to our large population, we probably generate more data than any other country. This sounds great, right, but much of this Indian co… more
Session type: 30 mins talk
|
AI--By the People For the PeopleIntroduction Like steam engine, Electricity and Internet became integral to the first, second and third Industrial revolutions, AI is going to be adapted sooner than later in all production and business processes. It is very rapidly going to change the way people conduct their businesses and how the production processes are executed. While the Free/open and the proprietary/closed nature of the so… more
Session type: 30 mins talk
|
Llama.lisp: design of an AI first compiler frameworkAbstract: Compilers are workhorses of performance behind all AI algorithms. Making algorithms work effectively on GPUs is especially hard - called kernel programming. Compiler ecosystem around GPUs is especially messed up. Compilers are supposed to allow for performance portability of different hardwares but this is usually not the case. See below infographic for current state of AI compilers. more
Session type: 30 mins talk
|
Fine-Tuning LLMs for Script-Writing: A Journey into the world of open source LLMsExplore the emerging technology of open-source Large Language Models (LLMs) with a hands-on tutorial where we will fine-tune an LLM to build a script-writing assistant for a popular daytime soap. This session delves into the fundamentals of LLMs, including pre-training and fine-tuning with LORA (Low-Rank Adaptation) or Direct Preference Optimization (DPO), offering a good understanding of these n… more
Session type: Workshop
|
A new approach to building high-performance lakehouse compute engines for open table formats like Delta lake, Apache Iceberg, and Apache HudiIntroduction Platform engineering and data architecture teams are increasingly adopting object-store backed data lakehouses as their central, unified platform for workloads across Analytics as well as AI. more
Session type: 30 mins talk
|
Build a Data Product - A Roundtable DiscussionBuild a Data Product - A Roundtable Discussion Outline more
Session type: Workshop
|
Samvaadini - a telecalling voice bot for blue-collar hiring in IndiaIn this talk, we introduce ‘Samvaadini’, a telecalling voice bot designed specifically for hiring blue-collar workers in India. Samvaadini leverages state-of-the-art AI to capture human responses adeptly, autonomously determine the next action, and respond in a voice indistinguishable from a human. more
Session type: Demo - showcase of your work
|
AI-GOLD: Identifying Billion-Dollar AI Use CasesIt is a well-known fact that most (Gen)AI projects fail to deliver any return on investment (ROI) [1]. The reasons behind this are multifaceted. One fundamental reason is the pursuit of suboptimal, and at times entirely inappropriate, use cases. more
Session type: 30 mins talk
|
Product Management for AI-first products** In depth & exclusive content to help seasoned product managers transition into AI first world with tons of case studies and examples** more
Session type: Workshop
|
AI Product management: paradigm shift or old wine in new bottle?In this talk, we discuss why Traditional Software Product Management skills fall drastically short when building (Gen)AI Products. Why Product Management for (Gen)AI Products Requires Major Upgrade more
Session type: 30 mins talk
|
Designing CoPilot: An AI-Driven Approach to Conversational Email Marketing CampaignsIn today’s digital marketing landscape, the demand for personalized and effective email campaigns is ever-growing. Our talk dives into this challenge, presenting an innovative solution through the creation of CoPilot, an AI-driven system tailored for Freshmarketer. more
Session type: 30 mins talk
|
Revolutionizing D2C Marketing: Empowering with Product Recommendation FrameworkFreshmarketer empowers D2C store owners with data-driven marketing solutions tailored to their unique needs. Our approach integrates product recommendation systems into marketing campaigns, addressing various marketing objectives. Unlike one-size-fits-all solutions, Freshmarketer develops decision engines customized to individual stores, ensuring flexibility across different categories and campai… more
Session type: 30 mins talk
|
Unlock Data with NL2SQL: Building Low-code Data Assistant for Business using Code LLMsThis BoF session talks about building low-code data assistant for business using code LLMs. Generative AI (LLMs) for codes had become very popular and powerful tool for developers to leverage with rise of enterprise solution like GitHub Copilot, AWS Code Whisperers, Google Duet etc along with numerous open source code assist models for generating codes for hundreds of programming languages includ… more
Session type: Birds of Feather (BOF) session
|
Leveraging the Power of Log Clustering Algorithms to Reduce Alert Noise in IT OperationsThe ever-increasing volume of alerts generated by monitoring tools poses a significant challenge for IT Operations teams. A substantial portion of these alerts are duplicates or false positives, overwhelming ITOps practitioners and hindering the timely identification of critical issues. Traditional methods for managing alert floods, such as manual filtering, are ineffective, prone to human error,… more
Session type: 30 mins talk
|
The Multimodal Revolution: Reshaping Video Analysis PipelinesMultimodal AI is revolutionizing video analysis, but practical insights on pipeline design are scarce. Traditional computer vision pipelines often involve a complex web of specialized models. This leads to high costs, maintenance burdens, and difficulty in adapting to new tasks. This talk will dissect a real-world case study where multimodal models dramatically simplified a large-scale video anal… more
Session type: 30 mins talk
|
Enterprise-Ready Data Lifecycle: Powering AI & Analytics at scaleIn this session, we discuss Atlassian data architecture to help demystify the complexities around building a real-world scalable Delta Lakehouse meeting data governance and compliance requirements and how we enabled various teams to iterate fast for their data-driven initiatives. more
Session type: Birds of Feather (BOF) session
|
Advancing TB Screening: Integrating Vision Language Models and Patient MetadataProblem TB claims over 1.3 million lives annually, with around 30% of cases missed by current screenings and diagnostics. The shortage of radiologists further complicates timely and accurate TB screenings, often relying on subjective interpretations that can lead to missed diagnoses or unnecessary treatments, impacting patient’s health. There is a critical need for accurate detection and differen… more
Session type: 30 mins talk
|
Vector databases Birds of Feather (BOF) sessionBackground With the recent technological advancement in LLM’s, embedding generation and Retrieval Augmented Generation(RAG), there is immense interest in using these technologies to solve problems involving Semantic Search, Chat Bots, Code Graph, Knowlegde Graphs etc. more
Session type: 30 mins talk
|
Unified Help in Jira Service Management using AIIntroduction Atlassian’s Jira Service Management (JSM) has consistently strived to empower customers by delivering top-notch assistance to those in need. A primary objective has been to promote self-service within JSM, allowing users to promptly access help while reducing the workload on agents. more
Session type: 30 mins talk
|
Nested Evolution and Schema Transformation (NEST) Framework for Managing Schema Evolution in SparkOverview The NEST Framework automates the handling of dynamic and nested schemas, making it easier for developers to manage schema changes and maintain accurate, deduplicated tables in Spark. We are excited to present this innovative solution at the Data Engineering Conference. more
Session type: 30 mins talk
|
Need for new licenses in this age of Generative AITable of Contents Introduction The disruptive nature of AI technology more
Session type: Birds of Feather (BOF) session
|
Unifying Senses: The Evolution, Technology, and Impact of Multimodal FusionMultimodal fusion has revolutionized the way we integrate and interpret diverse data sources, creating powerful insights from the synergy of visual, auditory, and textual information. In this talk, titled “Unifying Senses: The Evolution, Technology, and Impact of Multimodal Fusion,” we will explore the origins of multimodal fusion and trace its development over the years. We’ll delve into how thi… more
Session type: 30 mins talk
|
Intent Prediction in Search at MyntraMyntra is one of India’s leading fashion e-commerce companies, delivering a best-in-class shopping experience through advanced machine learning models. This session will delve into a key machine learning solution designed to enhance query understanding for product search flow. Our ML model accurately interprets user intent from all types of search queries, helping shoppers find exactly what they … more
Session type: 30 mins talk
|
Vector Databases: A Bird's Eye ViewThis talk is focused on an equipping the audience with an overall understanding of the current vector database landscape, and how vector databases work internally with a focus on a few common algorithms. more
Session type: 30 mins talk
|
Triton, the hard way!Abstract A lot of engineers are interested in using LLMs nowadays. However, its efficient execution remains a challenge. Efficient execution is key to mainstream adoption. To run them efficiently, we need accelerated systems such as GPU. This talk will explore the fundamentals of GPU architecture and its programming model, moving beyond model.to('cuda') to understand the inner workings of GPUs. A… more
Session type: Workshop
|
Ephemeral data pipelines using Atlassian’s Lithium platformThere are numerous use cases that require moving large amounts of data between different systems and validating and transforming them in-flight. Platforms such as Apache Flink can be excellent choices for moving and transforming data at scale - effectively through streaming ETL. However, certain use cases within Atlassian ‘onprem to cloud data migration’, ‘cloud to cloud data migration’, ‘backup … more
Session type: 30 mins talk
|
Democratizing AI: Harnessing Decentralized GPUs for AI Model Fine-Tuning and DeploymentAbstract: This session delves into the complexities involved in building a scalable, decentralized GPU cloud tailored for the efficient training, fine-tuning and deployment of AI models, more specifically large language models (LLMs). We will explore the significant technical hurdles our team overcame, including ensuring cost-effectiveness, optionality and accessibility of GPU resources. This inf… more
Session type: 30 mins talk
|
Apache XTable (Incubating): Interoperability across table formatsApache Hudi, Delta Lake, and Iceberg are leading open-source projects that offer decoupled storage with transactional and metadata layers, known as table formats in cloud storage. These formats store data in open columnar formats like Parquet and include metadata for schema, commit history, partitions, and column statistics. Selecting a table format can be challenging due to the unique features o… more
Session type: 30 mins talk
|
Jira cloud data extraction @ scaleCloud data extraction is a subset of the broader data engineering field that involves the process of retrieving or pulling data from cloud-based applications and services for analysis, reporting, or storage in a centralized data repository. Atlassian’s data extraction solution has evolved significantly over the years to meet the demands of enterprise-grade customers. Initially started with full t… more
Session type: 30 mins talk
|
Improving search relevance in hyperlocal food delivery using (small) language modelsIntroduction The ability to accurately understand and serve customer search queries is critical to Swiggy. This need is amplified in food delivery platforms operating in India due to the wide variety of languages, cuisines and tastes. Our platform alone offers millions of items from hundreds of thousands of restaurants across India. Not only do Indian dish names have a tremendous amount of region… more
Session type: 30 mins talk
|
Deviations from the norm - anomaly detection with PerceptInsightAbstract In a metric driven digital world not only is observability important but being able to understand anomalies in data streams and being able to do correlations adds significant advantages to organisations. In this talk I discuss how how we went about building this with PerceptInsight processing over 500 million events/day and how different organisations are leveraging it to their benefit. more
Session type: 30 mins talk
|
Design Patterns for Data Masking and TokenizationOutline In the era of big data, ensuring the privacy and security of sensitive information is more crucial than ever. more
Session type: 30 mins talk
|
AI and Risk Mitigation Strategies in Key Indian SectorsAbstract: As AI continues to revolutionize various sectors, it brings both unprecedented opportunities and significant risks. In India, sectors such as Agritech, Fintech, Edtech, public services, and Healthtech are rapidly adopting AI technologies. However, the lack of robust risk mitigation strategies can lead to unintended consequences, including data breaches, algorithmic biases, and systemic … more
Session type: Birds of Feather (BOF) session
|
Book Discussion on Dream Machine: A Graphic Novel about AIJoin us for a candid chat with the artist Appupen, whose recently released graphic novel Dream Machine explores the implications of unleashing AI to the real world. Through the narrative centered around Hugo — an entrepreneur who dreams of being a superhero — the novel uncovers some crucial concerns around implementing AI at scale such as Bias, Surveillance, Ethics, Trust & Creativity. more
Session type: Birds of Feather (BOF) session
|
Securing big data environmentsThis BoF is about securing big data environments and learning about different controls from the security, data privacy, and compliance side. How to balance security, scale, and user experiences while scaling big data environment. There would be discussion around certain use cases and edge cases that data platform team should be aware of while implementing certain security controls more
Session type: Birds of Feather (BOF) session
|
Solving the Data Platform Puzzle: Observability Meets Cost OptimizationOutline This session is aimed at data platform engineers, data architects, and engineering leaders who are looking to significantly reduce costs while maintaining or improving platform performance and reliability. The content will be tailored to those with a strong technical background who are facing challenges around optimizing complex data pipelines and infrastructure. more
Session type: 30 mins talk
|
Ensuring Data Quality with Data Contracts and OpenLineageAbstract In the modern data landscape, ensuring data quality and integrity is paramount. This conference will explore the concept of Data Contracts as a schema registry, incorporating data quality (DQ) checks and leveraging OpenLineage to capture compliance failures. By implementing Data Contracts, organizations can enforce strict data quality standards and track lineage to understand the impact … more
Session type: Workshop
|
RAG Vs Fine-Tuning: Implementation Anecdotes from Data Catalog Enrichment SolutionAbstract This talk will take the audience through our experience from building a content generation solution for data catalog enrichment effort from modeling perspective (RAG based pre-trained model & RAG based FineTuned model). more
Session type: 30 mins talk
|
Digital Twin for Retail Shelf Optimization using Advanced ML and GenAI at AB InBev (world’s largest beer company)Outline AB InBev sells a significant share of its beer volume through retailers who take AB InBev’s assistance to configure the best shelf assortment that would maximize their revenues. Planograming is a very important step that retailers perform to allocate their available shelf space to the right products. For large retailers that have 1000s of stores, this becomes a very time consuming and ted… more
Session type: 30 mins talk
|
Building an AI Data AnalystLLMs have transformed data analysis. With their ability to generate code to analyse data given appropriate prompts and instructions, LLMs are forming the bedrock of a new suite of data analysis tools. more
Session type: 30 mins talk
|
Unlocking the power of Real Time Feature StoresIn today’s data-centric world, businesses rely on personalization now more than ever. Whether it’s personalizing user experiences, optimizing operations, or predicting market trends, data plays a pivotal role. To harness the full potential of data, organizations are turning to real-time feature stores. In this talk, we’ll explore what real-time feature stores are, why they matter, and how we at Z… more
Session type: 15 mins talk
|
Practical tips for building AI applications using LLMs - Best practices and trade-offsOverview At KushoAI, we’ve built an AI agent that can autonomously perform API testing for you. While building this, we came across a lot of problems specific to AI applications built on top of LLMs that you don’t see anywhere else. Since this is a fairly new area of development, we had to spend a lot of time figuring out solutions for them on our own. more
Session type: 30 mins talk
|
Establishing Causality using AI in Mental HealthThis talk explores the forefront of artificial intelligence (AI) in establishing causality in mental health. By leveraging Graph Neural Networks (GNNs) and Spatio-Temporal Graph Neural Networks (STGNNs), we aim to uncover causal relationships in complex mental health causal effects. The session will cover fundamental concepts of causality, the transition from traditional GNNs to STGNNs, and the c… more
Session type: 30 mins talk
|
Content Moderation Systems at ScaleWe heavily rely on the Web for meeting our information needs today. Examples include Wikipedia, Twitter, Instagram, Youtube, Google Maps etc. All of these are platforms where millions of users post billions of pieces of content every day on a wide range of topics. The content is consumed by hundreds of millions of users. While a rich source of information, these platforms are also easy targets fo… more
Session type: 30 mins talk
|
LLM's Anywhere: Browser Deployment with Wasm & WebGPUDescription In today’s interconnected world, deploying and accessing machine learning (ML) models efficiently poses major challenges. Traditional methods rely on cloud GPU clusters and constant internet connectivity. However, WebAssembly (Wasm) and WebGPU technologies are revolution more
Session type: 30 mins talk
|
Getting dimensions right! A sneak peak at entity resolution in the warehouse and datalakeReal world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields, which makes it hard to combine them together, especially with growing data volumes. This hurts customer analytics - establishing lifetime value, loyalty programs, or marketing channels is impossible when the base data is not linked. N… more
Session type: 30 mins talk
|
From Foundation to the Future: The Evolution of Dream11's Data PlatformOutline Introduction (5 minutes) Brief overview of Dream11 more
Session type: 30 mins talk
|
Building and Deploying LLM Applications: From Concept to Production - AMA with Mixture-of-ExpertsSession Overview AMA with Mixture-of-Experts on Building Building and Deploying LLM Applications: From Concept to Production was held on 24th July, at BIC, as a part of The Fifth Elephant 2024 Annual Conference at BIC. more
Session type: Birds of Feather (BOF) session
|
Imagining the Future of AI in IndiaThe rush for building AI is taking over every business organization to adapt to a new era of automation. The push for AI is going to take over the economy and society at large with applications of AI in every sector. This brings many important questions of production of AI, from large scale data centers to data sets required to train these complex mathematical models. In this context, how do we i… more
Session type: Birds of Feather (BOF) session
|
Hosted by
Supported by
Gold Sponsor
Sponsor
Community Partner
Beverage Partner