Building a Generic but highly customizable and scalable Anomaly Detection System @ Badoo
Submitted by Akash Mishra (@sleepythread) on Tuesday, 30 May 2017
Full talk for data engineering track
Badoo is a data driven company with 340 million users across 190 countries it provides a number of apps and white label services across multiple platforms. Badoo crunches through around 23 billion events per day with 600 different types of events. Automated tracking a large number of events and reporting observations which do not conform to an expected pattern is the essential part of our data driven methodology.
Badoo had an Anomaly Detection prototype which was built using the strong competence of Data scientists involving complex algorithm. Prototype gave us the experience of many requirements which needs to be fulfilled to have a scalable and robust system.
Based on our learning with the prototype, Badoo Data Engineering team decided to build a new Anomaly Detection system with the following requirement,
- Extensible: User should be able to add any new Anomaly report without any manual intervention involved.
- Scalable: User can be able to track and process thousands of metrics.
- Accuracy: we should low false-positive rate.
- Customizable: User should be able to define its own condition of Notification and Notification delivery medium.
- Accuracy: System should have a low false-positive rate.
- Secure: System must be able to specify the level of access to data.
In this talk, I would be sharing the learnings and the Architecture of our Anomaly detection system build using Hadoop, Spark and other Big Data Technologies.
- Introduction to Badoo.
- Details of Badoo’s in-house event tracking system.
- Learning from Anomaly Detection Prototype.
- New Anomaly Detection system.
- The architecture of the system.
- Anomaly Detection Module.
- Ranking Module.
- Delivery Module.
- Flow for adding a new report to Anomaly Detection system.
- Future work.
Akash Mishra is currently working as a Data Engineer at Badoo Trading Limited with more than 6 years experience in building large scale big data application for various client of ThoughtWorks Technologies. He has production experience with various big data technologies like Spark,Hadoop, Mesos e.t.c. He is passionate developer and has deep interest in Distributed Systems. He has co-organised Big Data Meetup for Pune & NCR. He has already given various talks in meetups and Geek Night & contributed to Apache Spark project.