Automatic rollbacks for MLOps deployments in Kubernetes
While there are different tooling to automate deployments of ML models most of them require manually written rules for verifying deployments in production.
Would love to show community how to enable automatic rollbacks in their model deployment pipelines without needing any log regexes, monitoring rules to verify the health of the deployments. This is enabled via HybridK8s Droid, to learn more - https://docs.hybridk8s.tech
I will also talk more about how to embrace SRE mindset while managing production models and ensuring maximum uptime without needing too many SRE/DevOps tooling.
Key Takeaway :
- How to reduce production downtime/MTTR using progressive deployments?
- How to reduce lead time of productionising ML applications?
- How to reduce efforts required to implement/maintain deployment pipelines?
Anyone concerned about the stability of production models - Data/DataOps Engineers, Engineering Management