The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Regunath Balasubramanian

@regunathb

Scalability truths and serverless architectures - why it is harder with stateful, data-driven systems

Submitted May 22, 2017

Building scalable systems is not easy. It is not as simple as deploying on a cloud and expecting it to scale alongwith the cloud’s elasticity. Many systems and solutions that claim elasticity of scale often indirectly limit their claims to stateless services.
Serverless architecture is a recent addition to the developer programming/deployment toolset that offers the convenience of zero server deployments while preserving elasticity of scaling.
Building and scaling stateless systems has far fewer challenges over stateful systems. That said, stateless services are limited by data centre infrastructure and begs attention at large footprints - at tens of millions of requests per second. It is therefore seemingly easier to scale stateless services and adds credence to claims of almost limitless elastic scale.
In reality, there is little truth in a truly stateless service, in fact it is a case of state being pushed to another service/system. The challenges therefore shift to scaling stateful services - something harder to achieve.

In this talk I will give an overview of typical application workloads - online vs offline, interactive vs batch, sync vs async etc. and commonly used patterns and libraries to build these systems. We will also evaluate each of these examples to identify critical stateful services/systems and the challenges in scaling them. We will then take the Flipkart Flux open source project as an example to understand the design of a highly scalable stateful system that offers serverless computing for deployed applications, similar to AWS Lambda. The talk will cover various design and tech choices that enables millions of stateful, data-driven workflows/computes to run on the Flux system.

Outline

  • Defining scalability - as applied to stateless and stateful systems
  • Stateless service - case of state pushed to a stateful layer
  • Database/Data store for stateful systems. Choices of such stores - Relational, Append-only etc
  • Distributing stateful compute, things to take care of
  • Introduction to serverless architecture, what to expect. Services available
  • Building your own stateful serverless compute engine - the Flux example
  • Data engineering for stateful systems - scaling from single node to multi-node cluster on the network

Speaker bio

Regunath is an open source developer, engineer who built Aadhaar and later was responsible for Flipkart platform services. He is currently at HealthFace building data-driven decision systems for healthcare and personal health records. He is also a core contributor on the Flux project discussed in this talk.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures