Real-Time DataQuality on Flink
Jaydeep Vishwakarma
@None
My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.
Outline
What is Real Time Aggregation ?
System Requirement
Flink vs Spark
Flink Cluster setup
Flink on Yarn
Architecture
100% data completeness
Batch vs Realtime
Open Items
Speaker bio
I am a Staff Software Engineer in Walmart and Apache Oozie Committer. I am currently trying to solve some of the search problems. I am in Big Data space since last 10 years.
Slides
https://www.slideshare.net/jaydeepmail/real-time-data-quality-on-flink
{{ errorMsg }}