The Fifth Elephant 2023 Monsoon

On AI, industrial applications of ML, and MLOps



Navin Pai


Bespoke LLM Architectures for a K8s World

Submitted Jun 29, 2023


With the advent of Large Language Models (LLMs) and Generative AI (GAI), there is growing interest in building LLM-driven applications.. Within the industry, a growing number of companies are exploring whether they can leverage state-of-the-art, open source-driven LLM Models within their own existing k8s clusters and building applications that leverage these bespoke, in-cluster LLMs for a range of applications.

Motivations for this may be driven by security (sending live customer data to OpenAI is a big no-no in most regulated industries), RnD (ability to build custom LLMs for specific use-cases and tweak behaviour accordingly) or even just cost (OpenAI pricing can become a bottleneck at non-trivial scale).

In this talk, we’ll explore the techniques and best practices for deploying LLMs within K8s environments, and how workflows can be built to simplify LLM training/tuning, deployment and management, based on our experience building an LLM-driven platform, Aiden

Note: Given the nature and speed of development in the world of LLMs and GAI, this talk is intended to be more of an experience-driven best practices, mistakes we made while building out LLM-serving architectures, and discussing tradeoffs between different approaches of building GAI-driven applications rather than a one-size-fits-all talk.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

All about data science and machine learning

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing