Leveraging Streaming Systems for Machine Learning
Submitted by Vipul Gupta (@vipulgupta) on Wednesday, 8 June 2016
Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming systems like Spark, and how streaming systems can be used for Machine Learning problems in Test environments with limited resources.
The audience can expect to understand the benefits of using Streaming systems for setting up, training and testing their models in a smaller environment (even a single node), and eventually deploying such models in production environments with abundance of system resources at their disposal.
Testing Machine Learning Algorithms requires large data for training. This is essential to find a good model for prediction
Large Data Sets - far greater than memory of a node
Streaming Systems to the rescue - Example Spark
A Logistics Regression or K-means clustering algorithm example to demonstrate the concept