General guidelines for conference submissions We appreciate that many participants create submissions out of a genuine desire to share knowledge with our community, help solve common problems or to contribute in a meaningful way. However, we find that they sometimes fall short of achieving this obj… expand
We appreciate that many participants create submissions out of a genuine desire to share knowledge with our community, help solve common problems or to contribute in a meaningful way. However, we find that they sometimes fall short of achieving this objective because the written submissions fail to capture the attention of the community or meet acceptance through The Fifth Elephant’s peer review process. More often than not this is because the content of the submission does not explain what they intend with sufficient clarity or detail.
The template (and example) is an attempt to help you write a better submission, one that is noticed and understood by your intended audience and not lost in the crowd of interesting proposals we receive. Please use this template as a guideline, while ensuring that it is in your own unique and authentic voice.
BEFORE you begin writing your submission, please give some thought to the following:
Who is the audience for your session? Think about their interests, work roles, challenges, age or experience as you decide this.
What problem/pain are you trying to solve (for the audience)? This should be something that is communicated clearly so that they have a sense of your session’s importance.
What will be the scope of your session? This will help identify the central topic or theme and should describe broad areas you plan to cover during the session?
How will participants benefit from your session? Think of practical and specific ways in which they will be able to apply the knowledge they gain, and beyond just general awareness.
What is the appropriate format for your session, given the audience and objectives that you have in mind?
The most successful talks and sessions are those where presenters are able to abstract an actionable insight from a common pain area, enlighten the audience about something new, provide a fresh perspective, and/or demonstrate innovation.
Data engineering - data pipelines and dataset creation for AI; LLM ops; managing NLP pipelines.
Best practices for LLM training, inference, deployment; LLM and security - best practices on security while incorporating LLMs and SLMS in organizations; working with Open Source LLM models; security, bias and risk mitigation.
GenAI - Generative AI based use-cases, products, platforms and research such as multi-lingual models, use-cases where GenAI is being used.
In the era of big data, large language models (LLMs) are becoming increasingly important for tasks like question answering, document analysis, and chatbot development. However, traditional LLMs can often struggle with factual accuracy, reasoning, and handling complex information. more
Introduction In today’s rapidly evolving digital landscape, Generative AI is playing a pivotal role in transforming how our businesses interact with the customers. more
Business users and non-technical professionals often need to quickly analyse or transform tabular data in spreadsheets for ad hoc business intelligence. However, they might lack the necessary programming knowledge to do so themselves and therefore must reach out to a data analyst. Such unexpected delays have the potential to incur huge opportunity costs for time-sensitive business decisions which… more
Clearly, there is a race to build larger and larger AI models these days trained on as much training data as possible. Indian developers are also trying to make their presence felt in this race to build home-grown models for our unique problems and situations. Owing to our large population, we probably generate more data than any other country. This sounds great, right, but much of this Indian co… more
Introduction Like steam engine, Electricity and Internet became integral to the first, second and third Industrial revolutions, AI is going to be adapted sooner than later in all production and business processes. It is very rapidly going to change the way people conduct their businesses and how the production processes are executed. While the Free/open and the proprietary/closed nature of the so… more
Abstract: Compilers are workhorses of performance behind all AI algorithms. Making algorithms work effectively on GPUs is especially hard - called kernel programming. Compiler ecosystem around GPUs is especially messed up. Compilers are supposed to allow for performance portability of different hardwares but this is usually not the case. See below infographic for current state of AI compilers. more
Explore the emerging technology of open-source Large Language Models (LLMs) with a hands-on tutorial where we will fine-tune an LLM to build a script-writing assistant for a popular daytime soap. This session delves into the fundamentals of LLMs, including pre-training and fine-tuning with LORA (Low-Rank Adaptation) or Direct Preference Optimization (DPO), offering a good understanding of these n… more
Introduction Platform engineering and data architecture teams are increasingly adopting object-store backed data lakehouses as their central, unified platform for workloads across Analytics as well as AI. more
In this talk, we introduce ‘Samvaadini’, a telecalling voice bot designed specifically for hiring blue-collar workers in India. Samvaadini leverages state-of-the-art AI to capture human responses adeptly, autonomously determine the next action, and respond in a voice indistinguishable from a human. more
It is a well-known fact that most (Gen)AI projects fail to deliver any return on investment (ROI) [1]. The reasons behind this are multifaceted. One fundamental reason is the pursuit of suboptimal, and at times entirely inappropriate, use cases. more
In this talk, we discuss why Traditional Software Product Management skills fall drastically short when building (Gen)AI Products. Why Product Management for (Gen)AI Products Requires Major Upgrade more
In today’s digital marketing landscape, the demand for personalized and effective email campaigns is ever-growing. Our talk dives into this challenge, presenting an innovative solution through the creation of CoPilot, an AI-driven system tailored for Freshmarketer. more
Freshmarketer empowers D2C store owners with data-driven marketing solutions tailored to their unique needs. Our approach integrates product recommendation systems into marketing campaigns, addressing various marketing objectives. Unlike one-size-fits-all solutions, Freshmarketer develops decision engines customized to individual stores, ensuring flexibility across different categories and campai… more
This BoF session talks about building low-code data assistant for business using code LLMs. Generative AI (LLMs) for codes had become very popular and powerful tool for developers to leverage with rise of enterprise solution like GitHub Copilot, AWS Code Whisperers, Google Duet etc along with numerous open source code assist models for generating codes for hundreds of programming languages includ… more
The ever-increasing volume of alerts generated by monitoring tools poses a significant challenge for IT Operations teams. A substantial portion of these alerts are duplicates or false positives, overwhelming ITOps practitioners and hindering the timely identification of critical issues. Traditional methods for managing alert floods, such as manual filtering, are ineffective, prone to human error,… more
Multimodal AI is revolutionizing video analysis, but practical insights on pipeline design are scarce. Traditional computer vision pipelines often involve a complex web of specialized models. This leads to high costs, maintenance burdens, and difficulty in adapting to new tasks. This talk will dissect a real-world case study where multimodal models dramatically simplified a large-scale video anal… more
In this session, we discuss Atlassian data architecture to help demystify the complexities around building a real-world scalable Delta Lakehouse meeting data governance and compliance requirements and how we enabled various teams to iterate fast for their data-driven initiatives. more
Problem TB claims over 1.3 million lives annually, with around 30% of cases missed by current screenings and diagnostics. The shortage of radiologists further complicates timely and accurate TB screenings, often relying on subjective interpretations that can lead to missed diagnoses or unnecessary treatments, impacting patient’s health. There is a critical need for accurate detection and differen… more
With the recent technological advancement in LLM’s, embedding generation and RAGS, there is immense interest in using this technology to solve problems like Semantic Search, Chat Bots, Code Graph, Knowlegde graph. more
Introduction Atlassian’s Jira Service Management (JSM) has consistently strived to empower customers by delivering top-notch assistance to those in need. A primary objective has been to promote self-service within JSM, allowing users to promptly access help while reducing the workload on agents. more
Overview The NEST Framework automates the handling of dynamic and nested schemas, making it easier for developers to manage schema changes and maintain accurate, deduplicated tables in Spark. We are excited to present this innovative solution at the Data Engineering Conference. more
Introduction In this rapidly evolving digital era, data acts as the fuel powering the relentless growth of artificial intelligence. As we stand on the brink of technological revolutions, it becomes crucial to understand not just how data drives AI, but also the ethical and legal frameworks that must evolve with it. We should try to look at licensing as a tool to make sure that we can level the pl… more
Multimodal fusion has revolutionized the way we integrate and interpret diverse data sources, creating powerful insights from the synergy of visual, auditory, and textual information. In this talk, titled “Unifying Senses: The Evolution, Technology, and Impact of Multimodal Fusion,” we will explore the origins of multimodal fusion and trace its development over the years. We’ll delve into how thi… more
Myntra is one of India’s leading fashion e-commerce companies, delivering a best-in-class shopping experience through advanced machine learning models. This session will delve into a key machine learning solution designed to enhance query understanding for product search flow. Our ML model accurately interprets user intent from all types of search queries, helping shoppers find exactly what they … more
This talk is focused on an equipping the audience with an overall understanding of the current vector database landscape, and how vector databases work internally with a focus on a few common algorithms. more
Abstract A lot of engineers are interested in using LLMs nowadays. However, its efficient execution remains a challenge. Efficient execution is key to mainstream adoption. To run them efficiently, we need accelerated systems such as GPU. This talk will explore the fundamentals of GPU architecture and its programming model, moving beyond model.to('cuda') to understand the inner workings of GPUs. A… more
There are numerous use cases that require moving large amounts of data between different systems and validating and transforming them in-flight. Platforms such as Apache Flink can be excellent choices for moving and transforming data at scale - effectively through streaming ETL. However, certain use cases within Atlassian ‘onprem to cloud data migration’, ‘cloud to cloud data migration’, ‘backup … more
Abstract: This session delves into the complexities involved in building a scalable, decentralized GPU cloud tailored for the efficient training, fine-tuning and deployment of AI models, more specifically large language models (LLMs). We will explore the significant technical hurdles our team overcame, including ensuring cost-effectiveness, optionality and accessibility of GPU resources. This inf… more
Apache Hudi, Delta Lake, and Iceberg are leading open-source projects that offer decoupled storage with transactional and metadata layers, known as table formats in cloud storage. These formats store data in open columnar formats like Parquet and include metadata for schema, commit history, partitions, and column statistics. Selecting a table format can be challenging due to the unique features o… more
Cloud data extraction is a subset of the broader data engineering field that involves the process of retrieving or pulling data from cloud-based applications and services for analysis, reporting, or storage in a centralized data repository. Atlassian’s data extraction solution has evolved significantly over the years to meet the demands of enterprise-grade customers. Initially started with full t… more
Introduction The ability to accurately understand and serve customer search queries is critical to Swiggy. This need is amplified in food delivery platforms operating in India due to the wide variety of languages, cuisines and tastes. Our platform alone offers millions of items from hundreds of thousands of restaurants across India. Not only do Indian dish names have a tremendous amount of region… more
Abstract In a metric driven digital world not only is observability important but being able to understand anomalies in data streams and being able to do correlations adds significant advantages to organisations. In this talk I discuss how how we went about building this with PerceptInsight processing over 500 million events/day and how different organisations are leveraging it to their benefit. more
Abstract: As AI continues to revolutionize various sectors, it brings both unprecedented opportunities and significant risks. In India, sectors such as Agritech, Fintech, Edtech, public services, and Healthtech are rapidly adopting AI technologies. However, the lack of robust risk mitigation strategies can lead to unintended consequences, including data breaches, algorithmic biases, and systemic … more
Join us for a candid chat with the artist Appupen, whose recently released graphic novel Dream Machine explores the implications of unleashing AI to the real world. Through the narrative centered around Hugo — an entrepreneur who dreams of being a superhero — the novel uncovers some crucial concerns around implementing AI at scale such as Bias, Surveillance, Ethics, Trust & Creativity. more
This BoF is about securing big data environments and learning about different controls from the security, data privacy, and compliance side. How to balance security, scale, and user experiences while scaling big data environment. There would be discussion around certain use cases and edge cases that data platform team should be aware of while implementing certain security controls more
Outline This session is aimed at data platform engineers, data architects, and engineering leaders who are looking to significantly reduce costs while maintaining or improving platform performance and reliability. The content will be tailored to those with a strong technical background who are facing challenges around optimizing complex data pipelines and infrastructure. more
Abstract In the modern data landscape, ensuring data quality and integrity is paramount. This conference will explore the concept of Data Contracts as a schema registry, incorporating data quality (DQ) checks and leveraging OpenLineage to capture compliance failures. By implementing Data Contracts, organizations can enforce strict data quality standards and track lineage to understand the impact … more
Abstract This talk will take the audience through our experience from building a content generation solution for data catalog enrichment effort from modeling perspective (RAG based pre-trained model & RAG based FineTuned model). more
Outline AB InBev sells a significant share of its beer volume through retailers who take AB InBev’s assistance to configure the best shelf assortment that would maximize their revenues. Planograming is a very important step that retailers perform to allocate their available shelf space to the right products. For large retailers that have 1000s of stores, this becomes a very time consuming and ted… more
LLMs have transformed data analysis. With their ability to generate code to analyse data given appropriate prompts and instructions, LLMs are forming the bedrock of a new suite of data analysis tools. more
In today’s data-centric world, businesses rely on personalization now more than ever. Whether it’s personalizing user experiences, optimizing operations, or predicting market trends, data plays a pivotal role. To harness the full potential of data, organizations are turning to real-time feature stores. In this talk, we’ll explore what real-time feature stores are, why they matter, and how we at Z… more
Overview At KushoAI, we’ve built an AI agent that can autonomously perform API testing for you. While building this, we came across a lot of problems specific to AI applications built on top of LLMs that you don’t see anywhere else. Since this is a fairly new area of development, we had to spend a lot of time figuring out solutions for them on our own. more
This talk explores the forefront of artificial intelligence (AI) in establishing causality in mental health. By leveraging Graph Neural Networks (GNNs) and Spatio-Temporal Graph Neural Networks (STGNNs), we aim to uncover causal relationships in complex mental health causal effects. The session will cover fundamental concepts of causality, the transition from traditional GNNs to STGNNs, and the c… more
We heavily rely on the Web for meeting our information needs today. Examples include Wikipedia, Twitter, Instagram, Youtube, Google Maps etc. All of these are platforms where millions of users post billions of pieces of content every day on a wide range of topics. The content is consumed by hundreds of millions of users. While a rich source of information, these platforms are also easy targets fo… more
Description In today’s interconnected world, deploying and accessing machine learning (ML) models efficiently poses major challenges. Traditional methods rely on cloud GPU clusters and constant internet connectivity. However, WebAssembly (Wasm) and WebGPU technologies are revolution more
Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields, which makes it hard to combine them together, especially with growing data volumes. This hurts customer analytics - establishing lifetime value, loyalty programs, or marketing channels is impossible when the base data is not linked. N… more
Session Overview This expert session will orchestrate a discussion on exploring the cutting-edge world of Large Language Model (LLM) applications, focusing on real-world implementation strategies and best practices. Attendees will gain valuable insights into the entire lifecycle of LLM-based solutions, from initial concept to successful deployment in production environments. more
The rush for building AI is taking over every business organization to adapt to a new era of automation. The push for AI is going to take over the economy and society at large with applications of AI in every sector. This brings many important questions of production of AI, from large scale data centers to data sets required to train these complex mathematical models. In this context, how do we i… more
Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl
MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.