The Fifth Elephant round the year submissions for 2019

Submit a talk on data, data science, analytics, business intelligence, data engineering and ML engineering

Real-Time DataQuality on Flink

Submitted by Jaydeep Vishwakarma on Jun 17, 2019

Session type: Full talk of 40 mins Status: Rejected

Abstract

My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.

Outline

What is Real Time Aggregation​?
System Requirement
Flink vs Spark
Flink Cluster setup
Flink on Yarn
Architecture
100% data completeness
Batch vs Realtime
Open Items

Speaker bio

I am a Staff Software Engineer in Walmart and Apache Oozie Committer. I am currently trying to solve some of the search problems. I am in Big Data space since last 10 years.

Slides

https://www.slideshare.net/jaydeepmail/real-time-data-quality-on-flink

Preview video

https://www.youtube.com/watch?v=1iqFRvW4wrQ

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}