The Fifth Elephant 2016

India's most renowned data science conference

Vipul Gupta


Leveraging Streaming Systems for Machine Learning

Submitted Jun 9, 2016

Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming systems like Spark, and how streaming systems can be used for Machine Learning problems in Test environments with limited resources.

The audience can expect to understand the benefits of using Streaming systems for setting up, training and testing their models in a smaller environment (even a single node), and eventually deploying such models in production environments with abundance of system resources at their disposal.


Testing Machine Learning Algorithms requires large data for training. This is essential to find a good model for prediction
Large Data Sets - far greater than memory of a node
Streaming Systems to the rescue - Example Spark
A Logistics Regression or K-means clustering algorithm example to demonstrate the concept

Speaker bio



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}