Call me maybe: Jepsen and flaky networks arrow_forward
Dead Simple Scalability Patterns
Submitted by Vedang Manerikar (@vedang) on Monday, 15 June 2015
Everyone dreams of being ‘Web Scale’, but we start out small. We — most of us — don’t launch a service and expect it to serve millions of requests from Day 1. This means that we don’t think about the ways in which our stack will blow up when the number of requests does start climbing. This talk lists simple patterns and checks that Development and Operations teams should implement from Day 1 in order to ensure a robust distributed system.
This talk will highlight development patterns that are easy to catch in code review and go a long way in improving the life of your system. For example,
- Do not make an unbounded number of DB calls in any request
Bad Idea: For each person who retweeted “Ellen’s Oscar Selfie”, fetch their avatar from the DB
- Do not fetch an unbounded amount of data from the DB
Bad Idea: Fetch all users who retweeted “Ellen’s Oscar Selfie”.
- Build timeouts into every network call made by the system
Bad Idea: Wait forever for this list of RT users to load, don’t render the page until this happens
Slides will list out a large number of “obvious” (and some not-so-obvious) strategies that all distributed systems engineers should follow. For example,
- Data Projections - Fetch the absolute minimum amount of data required to satisfy a request from the DB
- Simple Profiling - Count the number of DB calls you make to serve a request end-to-end
- Essential Monitoring - Measure statistics to determine usefulness. Do you know your cache hit vs cache miss ratio?
- Awareness of Limits - What is the volume throughput limit on Amazon EBS volume?
I will also talk about architectural patterns that should be baked in from Day 1. For example,
- Separation of concerns using Message Queues
- LRU caching for permanent, unchanging data
- Version numbers in the schema for feature roll outs
… and more.
Vedang Manerikar is a Platform Architect at Helpshift and has helped the Helpshift SDK go from 0 installs to 1 Billion+ installs. Along the way, he has stayed up long nights, refactored multiple systems, and learned everything in this talk the hard way. He is also terrible at Markdown.