The Fifth Elephant round the year submissions for 2019

Submit a talk on data, data science, analytics, business intelligence, data engineering and ML engineering

Up next

Real-Time DataQuality on Flink

JV

Jaydeep Vishwakarma

@None

My use case is to provide monitoring, and improving the overall search data quality, also to find the unusual patterns of user’s search behavior, and notifying the intent on-site back to the respective business stakeholders. To achieve the same, I explored various big data processing engines, which can process the huge data with complex business logic in real time. Eventually, I used Flink Stream processing. This talk will showcase how I used Flink to accomplish my goal.

Outline

What is Real Time Aggregation ?
System Requirement
Flink vs Spark
Flink Cluster setup
Flink on Yarn
Architecture
100% data completeness
Batch vs Realtime
Open Items

Speaker bio

I am a Staff Software Engineer in Walmart and Apache Oozie Committer. I am currently trying to solve some of the search problems. I am in Big Data space since last 10 years.

Slides

https://www.slideshare.net/jaydeepmail/real-time-data-quality-on-flink

Comments