LLMs and generative AI have made their way into our day-to-day operations. While the wrappers over GPT are a good starting point, I was intrigued by what it takes for an SRE to understand the domain, identify its operational aspects, and build runbooks around running self-hosted LLM models.
Currently, too many models are built, but very few are in production. While many companies are trying to streamline the toolchain, it is still nascent. The body of work I will discuss is an experiment to build an understanding of the LLMOps ecosystem.
We built an internal server setup and have explored deploying the models on GPUs instead of relying on OpenAI.
Goals:
- Learn the domain from first principles.
- Build practices around running models on Kubernetes with GPUs.
- Know what it takes to run and manage Vector databases for storing embeddings.
- Use the above knowledge to build and produce RAG applications.
What was our learning curve:
- Take the basic concepts of the domain and build a mental model of the toolchain and ecosystem.
- Explore platforms like Ray/Kuberay and model repositories like HuggingFace. Understand their usage from operational aspects.
- Learn basic models like Phi3 and evolve to using advanced models like Llama3.1.
- Understand the pipeline from a developer perspective - using frameworks like Langchain, finding its limitations, and shifting our codebase to LlamIndex.
- Start with toy applications to explore each tool individually. After gaining a basic understanding, we moved to building an RAG application for internal usage - we built a resume filter application.
- Dogfooded it internally and learned more about prompt engineering, vector embeddings, and databases like QDrant.
What will you gain from this talk:
- How should you approach this domain if you are managing backend systems - Our learnings were less from a pure development view and more from a “How will I run this in production?” lens.
- What does it take to build your home lab, and where can you save costs - Is the public cloud cheaper, or if you are an org thinking of investing in the domain, does buying your hardware make sense?
- The domain of LLMOps is developing, and we are trying to learn it through experimentation. You will also gain a perspective and an approach to experimenting through this landscape.
This talk benefits software engineers at all levels but is especially relevant for SRE and DevOps practitioners.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}