Sep 2024
23 Mon
24 Tue
25 Wed
26 Thu
27 Fri
28 Sat 11:00 AM – 01:00 PM IST
29 Sun
An advanced exploration of war stories from building caching systems at a decacorn.
There is a reason why the following quote is famous:
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton
Caching is everywhere; it could be a simple system of storing only the most popular books in a shop and keeping all other books in the backroom, or the sophisticated ways we build caching in software systems. This ranges from a simple cache layer on top of a database to using complex data structures like bloom filters as a cache for more expensive checks to the storage layer.
This is a low to medium engagement talk on war stories in building caching systems.
Exploring various forms of caching systems and characteristics of a caching system.
Examples: Multiple cache layers, efficiency gains, eviction policies, metrics, consistency.
While a cache seems like a simple concept, there are numerous innovative ways a caching system gets deployed at scale.
Examples: Saving on latency, saving on cost, serving as a fallback, predictable response times, improved resiliency.
Despite the benefits, caching systems can introduce complexities and challenges that require careful consideration.
Examples: Cache Stampede, Bi-Modal Behavior, Timeout Propagation Issues, Cache Poisoning, Key Collisions
Unusual and unexpected behaviors that emerged when working with caching systems at scale.
Examples: Caching Increasing Response Time, Infinite Loops on Cache Invalidation, Negative Caching, Self-Immolating Caches, Cache Misses Triggering Batch Jobs
Best practices and lessons learned from building and maintaining caching systems.
Examples: Effective Key Management, Monitoring and Profiling, Handling Cache Errors Gracefully, Avoiding Misuse of Caching Systems, Data Serialization Strategies
The talk targets slightly experienced folks (1–2+ years of experience in backend) or even those who are early in their career if they have spent some time in systems thinking.
In terms of prerequisites, AWS has built a set of pretty good resources around caching: https://aws.amazon.com/caching/. The Systems Design Roadmap also provides a path to start more research around caching: https://roadmap.sh/system-design.
For those interested in a more advanced perspective, this paper is a very good entry point around patterns and behaviors observed when managing large-scale caching systems (both in terms of infrastructure and various use cases): https://www.usenix.org/conference/osdi20/presentation/yang
Srinivas is the founder of Opti Owl, a cloud cost optimization startup dedicated to enhancing system performance and reducing expenses. Formerly an SRE Team Lead at Zomato—a decacorn processing over 3 million orders daily with more than 300 microservices and over 1,000 engineers—Srinivas built systems and processes that helped Zomato grow from 10,000 orders per day to over 3 million orders per day in a resilient way. Passionate about sharing deep insights into the complexities of caching, Srinivas brings valuable real-world experience to the discussion.
Hosted by
Supported by
Sponsor
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}