This livestream is restricted
Already a member? Login with your membership email address
Jul 2025
14 Mon
15 Tue
16 Wed
17 Thu
18 Fri
19 Sat 08:45 AM – 05:55 PM IST
20 Sun
Submitted Apr 30, 2025
As organizations race to integrate large language models into their products and workflows, a new requirement is emerging: the need to host private LLMs in a scalable, secure, and operationally manageable way.
This talk presents a practical, cloud-agnostic architecture for hosting private LLMs with strong security isolation and efficient AI operations at scale.
We’ll explore how to enforce isolation, establish secure private network boundaries, and build a hardened control plane to manage LLM lifecycle and infrastructure state.
Central to this architecture is a model-agnostic access layer or a gateway, which decouples downstream systems from specific model APIs. It provides a consistent interface across model types and versions, while enabling operational features like request authentication, batching, standard and semantic caching, and routing.
In addition to architecture, we’ll explore the operational challenges of managing private LLMs in production, long-tail latency under concurrent load and build applications that utilise those applications and make the customer experience smooth.
Attendees will learn:
This session is designed for:
With the current set of databases, customers can only access data in a structured format. If they want to leverage that data with LLMs, they typically need to write a custom client that retrieves data from the database and sends it to the LLM.
Couchbase Server already supports User-Defined Functions (UDFs) via SQL++. We utilized UDFs to directly invoke LLMs and return the responses to the user.
However, we encountered a challenge: UDFs do not natively support authentication. While we had firewalls in place for our AI functions, relying on firewalls alone is not sufficient for robust security. To address this, we implemented AWS STS to generate temporary tokens, providing an additional layer of secure access.
Now that we support embedding models, we wanted to provide customers with a way to vectorize their existing data in Couchbase Server.
Couchbase Server includes its own protocol, DCP (Database Change Protocol), which streams document mutations to clients. One such client is the Eventing Service—an existing feature in Couchbase that allows users to write custom JavaScript logic to handle document mutations.
To deliver a seamless experience without reinventing the wheel, we chose to leverage the Eventing Service (already a DCP consumer) to vectorize customer data efficiently.
We also provided customer’s a way to add data to the database from PDF, text document etc. We created our own service UDS that extract JSON documents from these file and then insert them into the database.
We wanted to provide customers a way to query data using natural language by using Agent catalog.
Agent catalog manages their queries, they can integrate their own agents to this and when they give a agent catalog can perform a vector search and find the most relevent query and the agents can then execute that query.
Hosted by
Supported by
Gold sponsor
Gold sponsor
Bronze sponsor
Bronze sponsor
Community sponsor
Community Partner
Community partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}