Rootconf Mini 2024 (on 22nd & 23rd Nov)

Geeking out on systems and security since 2012

Nidhi Agarwal

Nidhi Agarwal

@nidhiag

Enhancing resiliency through CI/CD at Zomato: Advanced Automation and Real-Time Safeguards

Submitted Oct 30, 2024

Abstract

Building a CI/CD pipeline capable of supporting 700+ engineers, and managing 600+ deployments across 300+ services daily is essential at Zomato’s scale. Efficient CI/CD pipelines are critical for streamlining the development process and ensuring secure deployments.

In this session, we’ll cover how we revamped our CI/CD setup using self-hosted GitHub Actions to overcome these challenges.

We’ll explore

Problems with the existing setup on AWS CodePipeline and CodeBuild

  • Dependency on the SRE team: Adding a new CI check or deployment pipeline requires spawning AWS resources, making the process dependent on SRE intervention.
  • Fragmented workflow: Developers need to switch between GitHub and AWS to trigger or monitor CI checks and deployments while also managing separate access controls for AWS.
  • Newcomers face a challenging learning curve due to the complexity of the setup.
  • Lack of Trigger Visibility and Traceability. Additionally, It was difficult to customize or take override inputs from services for common pipelines.
  • The absence of canary deployments was a major limitation, requiring us to deploy only during low-traffic periods.
  • Missing features like Revert, Auto Abort, Manual Approval, etc.

How we orchestrated our self-hosted Github Action infrastructure

  • From placing the job on runners to monitoring and alerting the failures.
  • Ensuring no resource wastage by having a controller to maintain the pool of runners. leveraging spot instances without affecting the developer experience.
  • Fully private architecture with internal service communication with integration test support.
  • Observability: Runner, Workflow, and job Level Monitoring
  • Custom features we built to improve job runtimes and developer experience. Ex: Package caching, Proto caching, Notifications on failures, Dynamic resource allocation to Jobs.
  • Next Steps:
    • Docker Image Caching

An overview of how the CI/CD flow looks and how the developer experience has improved. Covering Features:

  • Unified Platform: No more tool-switching between GitHub and AWS CodeBuild.
  • How we used Reusable workflows in services across different languages to run customizable linters.
  • Overview of merge and release workflows, including developer forks, branch management (dev, master branches), and the pull request flow.
  • Canary Deployment support
  • Continuous Monitoring of Services during deployment, with automated rollback or manual decisions to abort or continue in case of errors.
  • One-click Revert
  • Deploy Services in multiple accounts and regions within the same deployment.
  • Dynamic deployment access management
  • Hotfix deployments
  • Next Steps:
    • Standardize build spec

Key Takeaways

  • Building orchestration for self-hosted GitHub Actions and observability around it.
  • Enhancing the CI/CD experience for developers while ensuring robustness.

Target Audience

  • DevOps and SRE engineers interested in CI/CD automation.
  • Anyone planning to implement self-hosted GitHub Actions orchestration for their workflows.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Sponsor

Community sponsor

Peak XV Partners (formerly Sequoia Capital India & SEA) is a leading venture capital firm investing across India, Southeast Asia and beyond.

Venue host - Rootconf workshops

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Community Partner

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ Twitter (not updated frequently): https://twitter.com/rustlangin more