Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Bangalore International Centre, Bangalore

Building and Deploying LLM Applications: From Concept to Production - AMA with Mixture-of-Experts

Submitted Jul 4, 2024

Session type: Birds of Feather (BOF) session

Session Overview

AMA with Mixture-of-Experts on Building Building and Deploying LLM Applications: From Concept to Production was held on 24th July, at BIC, as a part of The Fifth Elephant 2024 Annual Conference at BIC.

This AMA with MoE orchestrated a discussion on exploring the cutting-edge world of Large Language Model (LLM) applications, focusing on real-world implementation strategies and best practices. Attendees gained valuable insights into the entire lifecycle of LLM-based solutions, from initial concept and problem discovery to successful deployment and scaling in production environments. The topics were restricted to strictly LLMs and not multi-lingual, multi-modal models.

The MoE

Chintan Donda, Senior ML Engineer, Wadhwani AI
Sai Nikhilesh Reddy, Associate ML Scientist, Wadhwani AI
Prathamesh Saraf, Gen AI Backend Engineer, TrueFoundry
Rajaswa Patil, Applied AI, Postman
Praveen Pankajakshan, Chief Scientist, CropIn AI Lab, CropIn

Co-Hosts:

Bharat Shetty, Soma Dhavala

Target Audience

CTOs, technology leaders and executives in AI-focused startups
Software engineers and AI developers working on LLM integration and building LLM Apps.
AI researchers and data scientists exploring practical applications
Product managers overseeing AI-driven projects and delving into the LLM App Life Cycle
IT professionals responsible for deploying and maintaining AI systems

Topics

Production Insights: LLM Application Case Studies
Examine real-world examples of successful LLM deployments across various industries. Analyze key factors for success and common pitfalls to avoid.
LLM Application Lifecycle Management
Explore the entire process of developing LLM applications, from initial prototyping to scaling for production. Learn effective strategies for each stage of development.
Leveraging AI for Enhanced Productivity and Research
Discover innovative ways to utilize LLMs for accelerating research processes, boosting personal productivity, and facilitating continuous learning in organizations.
Specialized LLMs and Domain-Specific Models
Investigate the growing trend of smaller, more focused LLMs. Understand their advantages, use cases, and how they complement larger, general-purpose models.
Expert Panel: LLMs in Production Environments
Engage with industry leaders in an interactive Q&A session focused on practical challenges and solutions for implementing LLMs in real-world scenarios.

Key Discussion Points

Below are some important questions that one should ask in developing any LLM-based applications. The responses are groped and presented as best practices. Please note that the specific responses may vary as the technology matures or the Application flavors change. Due time constraints, many planned topics could not explored.

Key Takeaways

AI Products
- Start simple. Iterate
- Identify a champion on the customer side, who has seen the hype cycle and can assess the reality.
- Pilot/Deploy early - that feedback is important to get the problem right (solution follows).
LLMs: RAGs
- RAG is a search problem at its core - the interface happens to be with LLMs (for query understanding and response generation).
- But they help you fight dealing with data locality, recency which the pretrained LLMs can not handle.
- RAG is the tiny part. Context is the beast to focus on both usability level and also at the engineering level
- Think about UX - what users want to know/ask? How and what information they want to know. Users won’t comply with typical QA data seen in benchmark datasets.
- Data ingestion is the critical piece in the puzzle (representation, quality, cost) – all determine the downstream attributes.
- Create data with the help of whoever - actual users better, if not, involve domain experts who can represent the actual users, Generate synthetic data, if required. But solve the data challenges first.
- Evaluations are hard. Begin from here (define the metrics, along with data creation to design and test prompts)
ML Infra
- Profile user load. Cost and Tech choices can vary vastly between pilot vs scale.
- If an Enterprise user with many different verticals, focus on reusability, extensibility. RAG is again a tiny part. Data management, observability can all become real pain to deal with later. Choose tech stack wisely.
RAG is not everything
- Agentic AI is the evolution of CLI to UI to NLI (Natural Language Interface) between humans and systems, and systems and systems. They are useful in orchestrating disparate tasks like calling tools.
- A subset of this emerging paradigm is using LLMs for orchestrating a set of many tasks and executing a sequence of tasks which themselves can be handled by LLMs. One such example is, use one LLM as task router, another as task planner, and another as task executor, yet another as verifier, so on and so forth. But one should be careful about propagation of failures.
Future is full of surprises
- it won’t be what it is today - we’ll see new/better alternatives to transformers, and maybe RNNs will make a comeback in new forms) and factor that in while committing to solutions.
- New and creative ways of Gen AI will continue to emerge.

In conclusion, adopt a problem-first approach, good old design principles will give better mileage in the long run, than chasing the next best shiny thing.

Note: Discussion on other important aspects related to Data, and Evlauations could not be covered.

Key Questions To Ask

Problem:
- What is the problem?
- Why Gen AI?
- Who is the customer? This product canvas will be useful in giving it a structure
Data Strategy
- What kind of kind of data is needed? Where to get it (available on internet or have to collect on own)?
- How much is needed? Quality matters over quantity.
- Build App and collect data or collect data and build?
- How much to pay for it?
- How long can this data live (data can die due to environment drift, concept drift, label drift, covariate drift)?
- What about feedback to improve the offering?
Evaluation
- How to evalaute the overall system?
- How to evaluate the sub-systems?
- What is a succeful outcome?
- What are the business metrics?
- What are the ML metrics? Can they scale?
Design & Development
- What is an MVP? Is there a simpler alternative?
- Is it fully automate system or a co-pilot?
- Can you test in controlled setting? How will you scale?
- How to handle occassional errors?
- What is the cost of communicating and fixing those errors (or addressing them) during product usage?
LLMs - RAGs?
- What type of questions/ interaction to expect from users?
- Do the users know what to ask, how to ask?
- Why not plain/ classical IR?
- Is it transactional or session-level (need to maintian history)?
- How much domain knowledge is needed and available in narowing the problem? Is Graph-RAG more suitable?
- Is hallucination a bug or feature? If a bug - how to tame it? If a feature, how to explout it?
Commercial or Open Weights (open source)? Self-host or Managed service?

Resources

Papers/Tutorials/Courses
- applied LLms: a collection of resources, courses, blogs from experts in the space of LLMs
- Prompt Report: a taxonomical review of various prompting strategies
- RAG Survey Paper: Retrieval-Augmented Generation for Large Language Models: A Survey
- Data Flywheels: an architecture (and pipelines) to build LLMs applications which can continously improve.
- Evalulations:Task-Specific LLM Evals that Do & Don’t Work
- LLM course: hear from practitioners on a wide range of topics on LLMs, including RAG, evaluation, applications, fine-tuning and prompt engineering.
Tools/Libraries/Framewroks
- RAGs
  - GraphRAG: reason over graphs by converting unstructured data into structuerd data and then reason over your private documents
- Prompt Engineering
  - DSPy: a paradigm shift in prompt engineering. Get your optimzied prompts based on training data, as opposed to via a trial-and-error method
  - inspect-ai: provides many built-in components, including facilities for prompt engineering, tool usage, multi-turn dialog, and model graded evaluations.
- Agents
  - AutoGen: an agentic frameworks from Microsoft. Many have come up since then
- Synthetic Data Generation
  - DataDreamer tool for Synthetic Data Generation and Reproducible LLM Workflows
- Constrained Language Generation
  - instructor: that makes it a breeze to work with structured outputs from large language models
  - outlines another library to generate “constrained” outputs. Can use regex, FSMs, context-free grammers to enforce constraints
- Evaluations
  - ragas a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines