Previous proposalIncreasing Trust and Efficiency of Data Science using dataset versioning
Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed. Timely Dataflow is the computational model that addresses these challenges as an unified systems as suppose to bolting batch & stream processing system together. It is first presented as part of Naiad (SOSP 2013).
What are the challenges faced in steram processing: Imagine a system where the data is continuously updated and you need to support both historical data + recent stream and avoid the costly recomputation
How does timely dataflow fit in the stream processing model: Will be covering what timely dataflow offers - cyclic computation, notification mechanism, concept of time in stream processing
Why is it different from other stream processing systems like spark/storm/flink : Not all computation can be easily expressed in Directed Acyclic Graphs which most of the stream processing systems offers - one such example is cyclic computation which can be elegantly modelled in timely dataflow
Pros & Cons: Will take a practical example of an aggregation and showcase pros & cons of the timely dataflow model , with code and time taken
I am a passionate developer and a speaker. I regularly speak in the monthly geeknight meetup in chennai and have spoken in GIDS 2014,2015 both the years on dealing with systems that handle large volume of data with unique challenges of near real time processing. I have built and maintained systems for Banking, Media, and Retail domain. I continuously challenge the status quo and constantly thrive to improve on the solutions i have built in the past. This journey has made me build & rebuild real time analytics solutions that crunches large volume of data carefully balancing throughput & low latency