Apr 2026
6 Mon
7 Tue
8 Wed
9 Thu
10 Fri 04:00 PM – 08:30 PM IST
11 Sat 09:00 AM – 06:00 PM IST
12 Sun 09:00 AM – 06:00 PM IST
Sridhar Pillai
@sri_1030
Submitted Apr 7, 2026
Large Language Models can review code, but deploying a 70B model behind every pull request is neither practical nor cost-effective. What if a 1.5B-parameter Small Language Model could deliver reviewer-quality comments on Go, Python, and Kubernetes diffs — running on a single GPU at inference time?
In this talk, we present a production-grade iterative distillation pipeline that trains a compact code review SLM to match — and in targeted domains, outperform — models 5–50x its size. Starting from Qwen2.5-Coder-1.5B-Instruct, we apply multi-stage knowledge distillation: a 7B teacher model generates structured review training data, Supervised Fine-Tuning (SFT) teaches the student to speak like a reviewer, and Direct Preference Optimization (DPO) teaches it what good reviews look like versus lazy “no issues found” responses. Crucially, each pipeline run retrains from the previous run’s checkpoint — the model gets smarter with every iteration.
The entire workflow runs end-to-end on Red Hat OpenShift AI: Kubeflow Pipelines orchestrates a 7-step DAG, PyTorchJobs distribute QLoRA training across 20 GPUs on 5 nodes, KServe deploys the model with zero-downtime upgrades, MLflow tracks metrics across runs, and MinIO provides S3-compatible artifact storage. No notebook-driven one-offs — this is a repeatable, versioned, self-improving training loop.
We’ll walk through the architecture, show real before-and-after examples of the model catching goroutine leaks and password logging in Kubernetes operator code, share the hard-won lessons from scaling distributed training on ephemeral cloud GPUs, and demonstrate how DPO preference learning eliminates the “model collapse” failure mode that plagues naive fine-tuning.
SLMs can be domain-specialized to rival LLMs — a 1.5B model fine-tuned on 8K+ curated code reviews produces structured, actionable feedback that generic 7B+ models miss.
Iterative distillation is a force multiplier — each pipeline run retrains from the previous checkpoint (N-1 model), compounding improvements without human intervention.
DPO fixes what SFT breaks — without preference optimization, fine-tuned models collapse to safe, empty responses. DPO teaches the model to prefer detailed analysis over “LGTM.”
OpenShift AI provides a production MLOps backbone — Kubeflow Pipelines + PyTorchJob + KServe + MLflow is a complete, Kubernetes-native stack for training, deploying, and monitoring SLMs at scale.
Multi-node distributed training is table stakes — we’ll show how to go from a single-GPU 2-hour training run to a 20-GPU 18-minute run using PyTorchJob with DDP, and the pitfalls (NCCL, /dev/shm, node scheduling) you’ll hit along the way.
| Time | Section | Content |
|---|---|---|
| 5 min | The Problem | LLMs are too expensive for per-PR review. SLMs are too dumb out of the box. Can we close the gap? |
| 5 min | Data Pipeline | Mining 200 real reviews from kubeflow/trainer, supplementing with 8K HuggingFace examples, teacher enrichment via Ollama, data quality traps (poison templates, lazy negatives) |
| 8 min | The 7-Step Pipeline | Resolve Version → Extract Gold → SFT (QLoRA) → Deploy → Extract Preferences → DPO → Evaluate. Live demo of the Kubeflow DAG. |
| 5 min | Iterative Training | N-1 model as base, how compounding SFT+DPO cycles improve scores across runs, MLflow metric comparisons |
| 5 min | Scaling to 20 GPUs | PyTorchJob multi-node setup, node selectors, GPU scheduling wars, NCCL debugging, 6x speedup results |
| 5 min | DPO & Model Collapse | Why the model learned to say “no issues found” for everything, how we diagnosed it (data poisoning), how DPO preference pairs fixed it |
| 5 min | Live Demo | Submit a buggy Kubernetes operator diff → watch the SLM catch the goroutine leak, compare with teacher model output |
| 2 min | What’s Next | GRPO with reward functions, GitHub Action integration, expanding to Rust/Java |
Sridhar Pillai — Software Engineer at Red Hat, working on AI/ML platform tooling for OpenShift AI. Building production SLM training pipelines and MLOps infrastructure on Kubernetes. Contributor to Kubeflow Training Operator ecosystem.
| Component | Technology |
|---|---|
| Base Model | Qwen2.5-Coder-1.5B-Instruct |
| Teacher Model | qwen2.5-coder:7b-instruct (Ollama) |
| Training Method | QLoRA (4-bit) SFT + DPO |
| Distributed Training | PyTorchJob, DDP, 5 nodes × 4 T4 GPUs |
| Orchestration | Kubeflow Pipelines v2 (Argo Workflows) |
| Serving | KServe + vLLM runtime |
| Experiment Tracking | MLflow |
| Artifact Storage | MinIO (S3-compatible) |
| Platform | Red Hat OpenShift AI on AWS (g4dn.12xlarge) |
| Training Data | 8,296 curated code review examples (Go, Python, YAML) |
| Languages Reviewed | Go, Python, Kubernetes YAML |
MLOps · Small Language Models · Knowledge Distillation · Code Review · Kubernetes · OpenShift AI · DPO · Distributed Training · Kubeflow
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}