Rootconf Mini 2024

Geeking out on systems and security since 2012

Tickets

Loading…

Sandesh Kumar Gupta

@sandeshgupta59

Building Intelligence and resilience for highly available managed DbaaS platforms on hybrid cloud.

Submitted Oct 28, 2024

Objective

At Flipkart, we have seen the huge adoption of the home grown managed platforms running over hybrid cloud by all the engineering teams working at massive scale, and DbaaS platforms are protagonists of this story. It becomes paramount that these platforms can maintain high resilience, high availability to deliver sustained performance and continuous optimisations to handle adoption at scale. In this talk, we delve into:

  • What does it take to measure the availability in realtime as well as platform resiliency aspects for disaster recovery strategy for hybrid cloud topology.
  • How does these platforms intend to deliver further values with continous optimisation and recommendations to improve productivity of platform maintainers.
  • Our attempts to capitalize language models to extract intelligence for multiple platform level optimisations.

What will be the scope of this session? Key problems, challenges being addressed and lessons learned

Appropriate format for this session - 40 mins talk

Any managed DbaaS platform is not mere multiple dbs running, rather a stitched and tuned ecosystem of various components working in harmony.

DbaaS resiliency trade-offs over hybrid cloud setup during important sale events like Big Billion Days.

Productivity improvements, Oncall reduction and Opex tightning

Target audience and Takeaways for this session. Problem/pain to be solved for the audience.

  • This talk with particularly pique interests of the engineers who are fed-up with maintaining various self-managed stacks and keen on evaluating platform first approach. [(1) Building blocks]
  • Backend engineers evaluating systemic availability computation, DR usecases and productivity measures [(2) Availability computation]
  • Architects looking to resilient multi cloud design and optimise opex working with datastores. [(3) Resiliency and 4) Optimisations]
  • AI-for-DB or AI-for-Platforms connoisseurs [(5) Intelligence in-house]
  • It’s also highly relevant for developers working on large-scale distributed systems requiring extreme backend scaling with high throughput low latency performance needs. [Across]

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

We care about site reliability, cloud costs, security and data privacy