Running LLM Infra Locally: A Practical Guide

Submitted May 30, 2025

Choose the topic your submission falls under: Experimentation I am submitting for: Speaking at the Fifth Elephant 2025 Annual Conference Type of submission: 30 mins talk

Overview:

LLMs don’t have to live in the cloud. You can self host them even at home. Doing so doesn’t just provide control, but is also cost effective, and you can evaluate the trade-offs yourself by experimenting.

In this talk, we’ll discuss how to host LLMs locally using Ollama and what can be done to squeeze maximum performance out of the hardware you have. We’ll see how we can try out models with different parameter counts, quantization levels, and how they fare with frontier models, on a benchmark like the Bird-Bench for text-to-SQL tasks.

We’ll also compare the costs involved in running a task on cloud-hosted frontier models versus running the same task locally.

Takeaways:

You’ll find that self-hosting LLMs with Ollama is surprisingly simple.
You’ll get an idea of how local models fare against frontier models on benchmarks.
You’ll understand the tradoffs across different parameter counts, quantization levels and response times.

Target Audience:

People who’d like to run models locally for cost/compliance reasons
People who’d like to experiment with different LLMs to gauge how the performance changes as we adjust different parameters

Bio

Yogi is a backend engineer at Nilenso. There, he has worked on building a job orchestration platform and an IoT based telemetry system. Outside of programming, he enjoys traveling, science fiction, astronomy, and self-hosting open source software.

Links

Slides: https://docs.google.com/presentation/d/1H2pGCq35wirBWaGhwZQ0ASOV_hVq7AitqYPWRg0Z5qk/edit?usp=sharing

Blog covering the same topic:
https://blog.nilenso.com/blog/2025/05/27/experimenting-with-self-hosted-llms-for-text-to-sql/

The Fifth Elephant 2025 Annual Conference CfP