The Fifth Elephant 2016

India's most renowned data science conference

Leveraging Streaming Systems for Machine Learning

Submitted by Vipul Gupta (@vipulgupta) on Wednesday, 8 June 2016

videocam_off

Technical level

Intermediate

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +6

Abstract

Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming systems like Spark, and how streaming systems can be used for Machine Learning problems in Test environments with limited resources.

The audience can expect to understand the benefits of using Streaming systems for setting up, training and testing their models in a smaller environment (even a single node), and eventually deploying such models in production environments with abundance of system resources at their disposal.

Outline

Testing Machine Learning Algorithms requires large data for training. This is essential to find a good model for prediction
Large Data Sets - far greater than memory of a node
Streaming Systems to the rescue - Example Spark
A Logistics Regression or K-means clustering algorithm example to demonstrate the concept

Speaker bio

-NA-

Comments

Login with Twitter or Google to leave a comment