The Fifth Elephant 2016

India's most renowned data science conference

Timely Dataflow

Submitted by Bharani (@bharanisub) on Tuesday, 22 March 2016

videocam_off

Technical level

Advanced

Section

Crisp talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +28

Abstract

Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed. Timely Dataflow is the computational model that addresses these challenges as an unified systems as suppose to bolting batch & stream processing system together. It is first presented as part of Naiad (SOSP 2013).

Outline

What are the challenges faced in steram processing: Imagine a system where the data is continuously updated and you need to support both historical data + recent stream and avoid the costly recomputation

How does timely dataflow fit in the stream processing model: Will be covering what timely dataflow offers - cyclic computation, notification mechanism, concept of time in stream processing

Why is it different from other stream processing systems like spark/storm/flink : Not all computation can be easily expressed in Directed Acyclic Graphs which most of the stream processing systems offers - one such example is cyclic computation which can be elegantly modelled in timely dataflow

Pros & Cons: Will take a practical example of an aggregation and showcase pros & cons of the timely dataflow model , with code and time taken

Speaker bio

I am a passionate developer and a speaker. I regularly speak in the monthly geeknight meetup in chennai and have spoken in GIDS 2014,2015 both the years on dealing with systems that handle large volume of data with unique challenges of near real time processing. I have built and maintained systems for Banking, Media, and Retail domain. I continuously challenge the status quo and constantly thrive to improve on the solutions i have built in the past. This journey has made me build & rebuild real time analytics solutions that crunches large volume of data carefully balancing throughput & low latency

Links

Slides

https://docs.google.com/presentation/d/1mtGyIWsdEEHvcCnMVOobMp78S6FvBlgoZ58JRMSeZ-c/edit?usp=sharing

Comments

Login with Twitter or Google to leave a comment