The Fifth Elephant 2018

The seventh edition of India's best data conference

Incremental transform of transactional data models to analytical data models in near real time

Submitted by Govind Pandey (@govind-pandey) on Monday, 26 March 2018

videocam
Preview video

Technical level

Intermediate

Section

Full talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +14

Abstract

Transactional systems are designed with data models to maximize write throughput across multiple parallel business flows. They evolve iteratively with business and need to react quickly to the changing business landscape to minimize time to market.
Analytical systems, on the other hand, require data models to maximize query throughput over broad, deep and large data volumes.
The need for a platform which transforms from the transactional data model to an analytical data model is well established in the industry. This is currently achieved through two different paradigms. Stream processing at lower latencies and batch processing at higher latencies.
We have solved the same problem through a third paradigm of incremental processing for intermediate latencies (5 minutes to 1 hour).
We considered and dropped implementations of the streaming paradigm either because of a lack of completeness guarantees or the absence of complex join capabilities across a large number of entities.
Our incremental processing platform transforms transactional data models to analytical data models. It provides expressability for complex joins across multiple entities (live with 30) through a Transformation Definition Language (TDL). These complex joins are evaluated incrementally as transactional data changes to periodically update the analytical data model. For near real time use cases, this is done every 5-10 minutes.
Changes to transactional data models are handled through version support in the TDL. These changes are absorbed with a pause and resume of the transformations.
The Flipkart Fulfillment Services Group serves over a million shipments in a day at its peak. Customer delight through a reliable and fast delivery of orders is our primary goal. To succeed in this endeavour, our ground operations depends on live and accurate visibility into the journey of all shipments pan India. Overall data volumes range in 10s of TBs with a change frequency of over 25k QPS at peak. All our transactional systems combined generate mutations with volumes close to 200 GB every second. Our incremental platform is built to handle this scale.
With this platform, we have achieved analytics at low latencies with high completeness without compromising on business agility.
In this talk, I will cover the specifics of our evaluations and our learnings from the journey of building the platform.

Outline

  1. Business and technical need
    a. 100% Completeness b. 5 minutes to 1 hour latencies c. Business Agility
  2. Evaluation and results of existing solutions
    a. Existing stream processing implementations b. Existing incremental processing implementations
  3. Our approach to solving the problem
    a. Incremental Transforms at scale for lower latencies b. Metadata c. Processing d. Learnings
  4. Results
    a. Live use-cases and Impact

Requirements

None

Speaker bio

Govind is focussing on Supply Chain Automation, Predictive Optimizations and Actionable Insights by leveraging Artificial Intelligence and in house Big Data platforms at Flipkart. His engineering experience is primarily in building platforms. In the past, he has worked on the inhouse stream processing platform at 247.ai and the Business Activity Monitoring, Business Rules and Process Orchestration platforms at Oracle. He is an Advanced Communicator Silver and Advanced Leader Silver as certified by Toastmasters International.

Links

Slides

https://www.slideshare.net/GovindPandey27/fifth-elephant-2018-incremental-processing

Preview video

https://youtu.be/Zp3qHylQe2g

Comments

  • 1
    Rishu Mehrotra (@gadgetmnky) 7 months ago

    Please update the slides with the visuals/schematics that have placeholders now. It gives the presentation more context.

    • 1
      Govind Pandey (@govind-pandey) Proposer 7 months ago

      Hello Rishu. Thanks for the review and comment. I agree that visuals will add more context for the overall audience. There is a plan to invest the time required for visuals that the talk deserves.
      The current slide deck is intended as a draft for the reviewers to understand my thought process and how I intend to orcehstrate the talk. In case there are specific questions about the content which will help you decide the relevance of the talk, I am happy to answer them directly.

Login with Twitter or Google to leave a comment