Leveraging Streaming Systems for Machine Learning

Jul 2016

25 Mon

26 Tue

27 Wed

28 Thu 08:30 AM – 06:25 PM IST

29 Fri 08:30 AM – 06:15 PM IST

30 Sat 08:45 AM – 05:00 PM IST

31 Sun 08:15 AM – 06:00 PM IST

NIMHANS Convention Centre

All submissions

Previous Next

Leveraging Streaming Systems for Machine Learning

Submitted Jun 9, 2016

Section: Crisp talk Technical level: Intermediate

Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming systems like Spark, and how streaming systems can be used for Machine Learning problems in Test environments with limited resources.

The audience can expect to understand the benefits of using Streaming systems for setting up, training and testing their models in a smaller environment (even a single node), and eventually deploying such models in production environments with abundance of system resources at their disposal.

Outline

Testing Machine Learning Algorithms requires large data for training. This is essential to find a good model for prediction
Large Data Sets - far greater than memory of a node
Streaming Systems to the rescue - Example Spark
A Logistics Regression or K-means clustering algorithm example to demonstrate the concept