The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Kiran Veigas

@kiranveigas Proposing

Anomaly Detection Using Apache Spark

Submitted Jun 1, 2015

walk through how we used Sparks scalable KMeans algorithm to detect Anomalies for our Cyber Analytics platform

Outline

Apache Spark has proved itself to be the next generation BigData processing tool , which has become a favourite for DataScientists and Data Engineers. Its Machine learning component provides well tested scalable algorithms.

It runs 10-100X faster than traditional map-reduce and it provides high level API’s making development an ease.Since Spark exposes API in Java, Scala, Python and R (Coming soon) Data scientists can use their favourite language to build data products.

In this session we will walk through how we used Sparks scalable KMeans algorithm to detect Anomalies for our Cyber Analytics platform.It will demonstrate a taste of Scala(Sparks Native language) , RDD ,and usage of K-means clustering . And how to improve clustering in a session with Spark. Finally we demonstrate how to use the K-means model in realtime to detect anomalies.

Speaker bio

Vishnu Subramanian works as solution architect for Happiest minds with years of experience in building distributed systems using Hadoop , Spark , ElasticSearch , Cassandra , Machine Learning.A Databricks certified spark developer and having experience in building Data Products. His interests are in IOT , Data Science , BigData Security

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures