Lessons learned from building a globally distributed database service from the ground up
Submitted by Dharma Shukla (@dharmashukla) on Friday, 26 May 2017
Dharma and his team has spent past 7 years to build Azure Cosmos DB (http://cosmosdb.com) - a massively scalable, multi-tenant, globally distributed database service from the ground up. The system they have built is currently operating across more than thirty-four geographical regions, managing hundreds of petabytes of indexed data, and serving 100s of trillions of requests every day from thousands of customers worldwide. The database system allows developers to elastically scale both, throughput and storage across any number of geographical regions on a single table. The service offers guaranteed single-digit millisecond low latency at 99the percentile, 99.99% high availability, predictable throughput, and multiple well-defined consistency models. The system is able to offer comprehensive SLAs for latency, availability, throughput and consistency and is used extensively within Microsoft and is available to external Azure customers since 2015. In this session, Dharma will describe the internals of the system design and various design trade-offs they had to make. He will also share his experiences from operating a globally distributed database service worldwide and maintaining comprehensive Service Level Agreements (SLAs).
The lessons I have learnt from building a globally distributed database can be applied to many distributed systems.
Some of the takeaways are:
1. Well-defined, relaxed consistency models are really powerful in solving real world scenarios
2. A system designed for cloud can be made to run really cheap if it is designed with resource governance in mind
4. What does it mean to build multi-tenant applications? What are the challenges?
5. Applications running on cloud deserve a globally distributed database.
6. A globally distributed database != database with DR
and many more..
Application developers of all types, distributed systems practitioners, data engineers, system integrators and consultants.
- What does it mean to build a database that leverages the strengths of cloud?
- Horizontal partitioning
- Elastically scaling throughput (vs. storage) worldwide
- Resource governance and fine grained multi-tenancy
- Global distribution of data for low latency
- Global distribution of data for high availability
- Navigating the speed of light
- Navigating the CAP theorem
- Consistency Models - finding the right shade of grey!
- Why hosting on-premises databases (SQL or NoSQL) cannot offer the lowest TCO and best SLAs?
- What does it take to offer and maintain comprehensive SLAs for consistency, latency and throughput and availability.
- Operating a globally distributed database service, worldwide
- Insights from the production workloads
Familiarity with databases, cloud and challenges to build a scalable applications.
Dharma Shukla is a Distinguished Engineer at Microsoft. Dharma is also the founder of Azure Cosmos DB (http://cosmosdb.com) - a globally distributed, multi-tenant database service on Azure. Prior to working on the current system, his work spanned a range of distributed systems and databases at Microsoft and other places.