The Fifth Elephant

The Fifth Elephant 2025 Annual Conference

Less hype. More engineering.

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Bangalore International Centre, Bangalore

Tickets

All submissions

Previous Next

This submission has been added to the schedule

From Data to Dialogue : Making Databases Conversational and Intelligent

Submitted Apr 30, 2025

Choose the topic your submission falls under: LLM-Ops I am submitting for: Speaking at the Fifth Elephant 2025 Annual Conference Type of submission: 30 mins talk

Overview

As organizations race to integrate large language models into their products and workflows, a new requirement is emerging: the need to host private LLMs in a scalable, secure, and operationally manageable way.

This talk presents a practical, cloud-agnostic architecture for hosting private LLMs with strong security isolation and efficient AI operations at scale.

We’ll explore how to enforce isolation, establish secure private network boundaries, and build a hardened control plane to manage LLM lifecycle and infrastructure state.

Central to this architecture is a model-agnostic access layer or a gateway, which decouples downstream systems from specific model APIs. It provides a consistent interface across model types and versions, while enabling operational features like request authentication, batching, standard and semantic caching, and routing.

In addition to architecture, we’ll explore the operational challenges of managing private LLMs in production, long-tail latency under concurrent load and build applications that utilise those applications and make the customer experience smooth.

Takeaways

Attendees will learn:

How to design a secure isolation layer for private LLMs using cloud-native constructs.
The role of a model-agnostic AI Gateway (Access Layer) in:
- Unifying access across different LLM backends
- Implementing standard, semantic and conversational caching
- Aggregating requests for efficient batching
Operational strategies for:
- Orchestration and Upgrades
- Reducing long-tail latency
- Performance and Cost Tradeoffs
- Applications on top of Model service

Audience

This session is designed for:

Platform Engineers building secure AI infrastructure
MLOps / DevOps Engineers managing the deployment and scaling of LLMs
Cloud Infra and SRE Teams responsible for performance, availability, and cost control
AI Engineers deploying private models in enterprise, internal, or regulated settings
Anyone designing or running LLM infrastructure beyond prototypes

The Challenge: Why Multi-Tenant LLM Serving is Hard (Problem Statement)

Deploying LLMs effectively for multiple customers (tenants) goes beyond simple model hosting.
Key challenges: Ensuring scalability, robust security, tenant data isolation, cost management, and integrating value-added features.
Existing solutions often lack integrated, enterprise-grade capabilities, forcing organizations to build complex frameworks themselves for:
- Performance/Cost Optimization (Caching, Batching)
- Model Agnosticism & Upgrades
- Operational Needs (Auth, RBAC, Monitoring, Secure Networking)

Our Solution: A Layered AWS Architecture (Overview)

Presenting a robust, multi-tenant LLM platform architecture built on AWS.
Designed for scalability, security, cost-efficiency, and ease of use for tenants.

Core Principle: Secure Tenant Isolation

Strategy: AWS Account-per-Customer.
Benefits:
- Strict data separation and isolation.
- Simplified per-tenant billing and cost tracking.
- Enables secure, customer-specific networking (PrivateLink).
- Facilitates meeting compliance requirements.
Managed via a central control plane using cross-account IAM roles.

Key Architectural Components

Model Serving Layer:
- Leverages optimized toolkits (e.g., vLLM, NVIDIA NIM) for standardized inference APIs and performance.
- Model Serving Agent (on EC2): Manages model lifecycle (deploy, start/stop, update), reports health, collects metrics (for CloudWatch/Prometheus), and routes requests.
Networking & Secure Access:
- Public Access: Standard ALB + ASG + Route53 setup for stable public endpoints.
- Private Access (Preferred): AWS PrivateLink for secure, private connectivity from customer VPCs to the LLM service (avoids CIDR conflicts, simplifies security).

AI Gateway: The Value-Add Layer:

An intermediary service providing crucial features before hitting the model server.
Authentication & Authorization: API Key management, Role-Based Access Control (RBAC).
Intelligent Caching:
- Standard Caching: Key/Value store for identical prompts.
- Semantic Caching: Vector DB lookup for similar/paraphrased prompts.
Request Batching: Aggregates requests for improved throughput and cost-efficiency (especially if not native to the toolkit).

Benefits & Use Cases

AI Functions

With the current set of databases, customers can only access data in a structured format. If they want to leverage that data with LLMs, they typically need to write a custom client that retrieves data from the database and sends it to the LLM.

Couchbase Server already supports User-Defined Functions (UDFs) via SQL++. We utilized UDFs to directly invoke LLMs and return the responses to the user.

However, we encountered a challenge: UDFs do not natively support authentication. While we had firewalls in place for our AI functions, relying on firewalls alone is not sufficient for robust security. To address this, we implemented AWS STS to generate temporary tokens, providing an additional layer of secure access.

Vectorization Service

Now that we support embedding models, we wanted to provide customers with a way to vectorize their existing data in Couchbase Server.

Introduction to DCP

Couchbase Server includes its own protocol, DCP (Database Change Protocol), which streams document mutations to clients. One such client is the Eventing Service—an existing feature in Couchbase that allows users to write custom JavaScript logic to handle document mutations.

To deliver a seamless experience without reinventing the wheel, we chose to leverage the Eventing Service (already a DCP consumer) to vectorize customer data efficiently.

UDS(Unstructured Data Service)

We also provided customer’s a way to add data to the database from PDF, text document etc. We created our own service UDS that extract JSON documents from these file and then insert them into the database.

Agent Catalog

We wanted to provide customers a way to query data using natural language by using Agent catalog.

Agent catalog manages their queries, they can integrate their own agents to this and when they give a agent catalog can perform a vector search and find the most relevent query and the agents can then execute that query.

Conclusion

Building a successful multi-tenant LLM platform requires thoughtful architecture beyond basic deployment.
Combining AWS best practices (Account-per-Tenant, PrivateLink) with custom components (Model Serving Agent, AI Gateway) and intelligent scaling (GPU metrics) delivers a powerful solution.
Empowers customers to leverage LLMs securely and efficiently without managing the underlying complexity.

All submissions

Previous Next

Comments

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Hybrid Access Ticket

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

Supported by

Gold sponsor

Sahaj Software

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold sponsor