Anomaly Detection Using Apache Spark
Submitted by Kiran Veigas (@kiranveigas) on Jun 1, 2015
walk through how we used Sparks scalable KMeans algorithm to detect Anomalies for our Cyber Analytics platform
Apache Spark has proved itself to be the next generation BigData processing tool , which has become a favourite for DataScientists and Data Engineers. Its Machine learning component provides well tested scalable algorithms.
It runs 10-100X faster than traditional map-reduce and it provides high level API’s making development an ease.Since Spark exposes API in Java, Scala, Python and R (Coming soon) Data scientists can use their favourite language to build data products.
In this session we will walk through how we used Sparks scalable KMeans algorithm to detect Anomalies for our Cyber Analytics platform.It will demonstrate a taste of Scala(Sparks Native language) , RDD ,and usage of K-means clustering . And how to improve clustering in a session with Spark. Finally we demonstrate how to use the K-means model in realtime to detect anomalies.
Vishnu Subramanian works as solution architect for Happiest minds with years of experience in building distributed systems using Hadoop , Spark , ElasticSearch , Cassandra , Machine Learning.A Databricks certified spark developer and having experience in building Data Products. His interests are in IOT , Data Science , BigData Security