The Fifth Elephant round the year submissions for 2019

Submit a talk on data, data science, analytics, business intelligence, data engineering and ML engineering

Real-Time DataQuality on Flink

Submitted by Jaydeep Vishwakarma on Jun 17, 2019

Session type: Full talk of 40 mins Status: Rejected


My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.


What is Real Time Aggregation​?
System Requirement
Flink vs Spark
Flink Cluster setup
Flink on Yarn
100% data completeness
Batch vs Realtime
Open Items

Speaker bio

I am a Staff Software Engineer in Walmart and Apache Oozie Committer. I am currently trying to solve some of the search problems. I am in Big Data space since last 10 years.


Preview video


{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}