Platform Engineering meet-up - Jan 8

Platform Engineering meet-up - Jan 8

Real systems. Real engineers. Real lessons.

Mustajib Mohammed Khan

@mustajib

Scaling microservices with Halyard at Confluent

Submitted Dec 20, 2025

This presentation, introduces the Halyard system designed to manage the complexity of deploying and releasing approximately 1,000 internal microservices across around 5,000 Kubernetes clusters at Confluent. It highlights the massive scale of the deployment infrastructure, which is organized into various deployment groups (e.g., devel, prod-canary, prod), and sets the stage by outlining the challenges faced by service owners, such as managing configurations across different environments, controlling which clusters receive specific deployments, and implementing progressive rollout strategies.

Halyard is presented as a collection of services and tools for managing the release and deployment of Helm-based microservices, allowing users to define targeting rules and control deployment strategies with features like error budgets and percentage rollouts. The core of the system is built around key entities like Service, Deployment Group (DG), Version, and Cluster. The presentation also details the halctl command-line interface, which service owners use to query deployment status and perform operations, and concludes with a look at future developments, including moving toward a Helm-native agent approach, leveraging Flux Controllers, and integrating with internal LLM toolsets for easier investigations.


  • Halyard manages over 1,000(growing continuosly) microservices across approximately 5,000 Kubernetes(growing continuosly) clusters at Confluent.
  • The system allows service owners to control version deployment and progressive rollouts across various groups.

  • Software Engineers and Service Owners interested in solving complex microservice deployment and release management challenges at high scale.
  • DevOps and Site Reliability Engineers (SREs) looking for insights into large-scale, automated, Kubernetes-based deployment architectures and tooling (like Helm and Flux).
  • Engineering Leaders and Architects exploring strategies for managing versioning, progressive rollouts, and configuration across thousands of clusters and services.

I’m Mustajib Mohammed Khan a senior software developer at confluent. I work in the cloud platform team at confluent.


The initial draft of the slides I intend to present - https://docs.google.com/presentation/d/1rd5fkeA18_uFrg_bPxgWYqaOsO8CfSjADZf6F9kF0bw/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Venue Host

Quintype is a SaaS platform for digital publishers, helping newsrooms build, manage, and scale high-performance content products.