Building a CI/CD pipeline capable of supporting 500+ engineers, and managing 600+ deployments across 300+ services daily is essential at Zomato’s scale. Efficient CI/CD pipelines are critical for streamlining the development process and ensuring secure deployments.
In this session, we’ll cover how we revamped our CI/CD setup using self-hosted GitHub Actions to overcome these challenges.
- Dependency on the SRE team: Adding a new CI check or deployment pipeline requires spawning AWS resources, making the process dependent on SRE intervention.
- Fragmented workflow: Developers need to switch between GitHub and AWS to trigger or monitor CI checks and deployments while also managing separate access controls for AWS.
- Newcomers face a challenging learning curve due to the complexity of the setup.
- Lack of Trigger Visibility and Traceability. Additionally, It was difficult to customize or take override inputs from services for common pipelines.
- The absence of canary deployments was a major limitation, requiring us to deploy only during low-traffic periods.
- Missing features like Revert, Auto Abort, Manual Approval, etc.
- From placing the job on runners to monitoring and alerting the failures.
- Ensuring no resource wastage by having a controller to maintain the pool of runners. leveraging spot instances without affecting the developer experience.
- Fully private architecture with internal service communication with integration test support.
- Observability: Runner, Workflow, and job Level Monitoring
- Custom features we built to improve job runtimes and developer experience. Ex: Package caching, Proto caching, Notifications on failures, Dynamic resource allocation to Jobs.
- Next Steps:
- Unified Platform: No more tool-switching between GitHub and AWS CodeBuild.
- How we used Reusable workflows in services across different languages to run customizable linters.
- Overview of merge and release workflows, including developer forks, branch management (dev, master branches), and the pull request flow.
- Canary Deployment support
- Continuous Monitoring of Services during deployment, with automated rollback or manual decisions to abort or continue in case of errors.
- One-click Revert
- Deploy Services in multiple accounts and regions within the same deployment.
- Dynamic deployment access management
- Hotfix deployments
- Next Steps:
- Building orchestration for self-hosted GitHub Actions and observability around it.
- Enhancing the CI/CD experience for developers while ensuring robustness.
- DevOps and SRE engineers interested in CI/CD automation.
- Anyone planning to implement self-hosted GitHub Actions orchestration for their workflows.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}