Scalability truths and serverless architectures - why it is harder with stateful, data-driven systems

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Scalability truths and serverless architectures - why it is harder with stateful, data-driven systems

Submitted May 22, 2017

Section: Full talk for data engineering track Technical level: Intermediate

Building scalable systems is not easy. It is not as simple as deploying on a cloud and expecting it to scale alongwith the cloud’s elasticity. Many systems and solutions that claim elasticity of scale often indirectly limit their claims to stateless services.
Serverless architecture is a recent addition to the developer programming/deployment toolset that offers the convenience of zero server deployments while preserving elasticity of scaling.
Building and scaling stateless systems has far fewer challenges over stateful systems. That said, stateless services are limited by data centre infrastructure and begs attention at large footprints - at tens of millions of requests per second. It is therefore seemingly easier to scale stateless services and adds credence to claims of almost limitless elastic scale.
In reality, there is little truth in a truly stateless service, in fact it is a case of state being pushed to another service/system. The challenges therefore shift to scaling stateful services - something harder to achieve.

In this talk I will give an overview of typical application workloads - online vs offline, interactive vs batch, sync vs async etc. and commonly used patterns and libraries to build these systems. We will also evaluate each of these examples to identify critical stateful services/systems and the challenges in scaling them. We will then take the Flipkart Flux open source project as an example to understand the design of a highly scalable stateful system that offers serverless computing for deployed applications, similar to AWS Lambda. The talk will cover various design and tech choices that enables millions of stateful, data-driven workflows/computes to run on the Flux system.

Outline

Defining scalability - as applied to stateless and stateful systems
Stateless service - case of state pushed to a stateful layer
Database/Data store for stateful systems. Choices of such stores - Relational, Append-only etc
Distributing stateful compute, things to take care of
Introduction to serverless architecture, what to expect. Services available
Building your own stateful serverless compute engine - the Flux example
Data engineering for stateful systems - scaling from single node to multi-node cluster on the network

Speaker bio

Regunath is an open source developer, engineer who built Aadhaar and later was responsible for Flipkart platform services. He is currently at HealthFace building data-driven decision systems for healthcare and personal health records. He is also a core contributor on the Flux project discussed in this talk.

The Fifth Elephant 2017