High availability is critical for tech companies to meet user expectations, maintain business operations, honor SLAs, gain a competitive edge and scale operations in their services or products.
Maintaining high availability requires implementing a robust architecture. It is a challenge faced by every multi-tenant tech company that has grown larger over the past years, in terms of number of customers, size, and features. Multi-tenancy means that multiple customers of a cloud vendor are using the same compute resources. In a multi-tenant architecture, an issue for a single customer can easily cascade and impact all its neighbors. As a matter of fact, there have been incidents in the past where a simple failure for one customer at the database level has threatened to take the entire web cluster down.
For Freshdesk to be a highly available product, blast radius had to be minimized. Blast radius is the maximum impact that might be sustained in the event of a failure. To minimize the blast radius and increase availability, shell architecture was introduced in Freshdesk. Shell architecture is a logical grouping of compute resources. Each shell is a stack, but serving only a smaller bucket of customers categorized by shards that they belong to. A shard is a horizontal partition of data in the database. Hence a set of customers are mapped to a particular shard. Set of shards are mapped to a shell. For observability of each stack and tenant, shell information is exposed as metric and shell specific alerts are configured.
Key Benefits of shell architecture
Flexibility to increase compute capacity for specific use cases.
Flexibility to isolate noisy tenants and help reduce blast radius.
Any company can adopt this architecture to maintain high availability as the company scales. This will give the company feasibility to isolate noisy tenants and help reduce blast radius.
To move to this architecture, companies need to come up with an approach to group customers into different stacks based on a particular/ set of criterias.For example, the nature of requests (light/heavy impact to DB) , rate of requests (rpm), business priorities of requests etc. Companies must have infra setup to support on demand scaling of stacks and adopt autoscaling solutions to scale within stacks.
This talk will focus on primary motivations, architecture, challenges companies should prepare for if they decide to move to shell architecture and its benefits.