The Fifth Elephant 2012

Finding the elephant in the data.

Survey of Machine Learning tools as the data grows (Weka, R, Octave, Mahout)

Submitted by Vivek Mehta (@vivekmehta) on Sunday, 24 June 2012

Section: Data Analytics Technical level: Intermediate Session type: Lecture Status: Submitted


To review the available machine learning tools relevant for different size and scale of data.


At different stages of organization growth, size and scale of data is different and so is the tool needed for doing machine learning(ML). Its not necessary to build huge team and GBs of data for ML tricks to be useful and relevant. ML can be applied using small amount of data with appropriate tools at early stage of organization. And as the org grows and so is the data size, one needs to change the tools required. One need to look at distributed ML system.

In this talk we will explore, with examples, use of specific tool based on the requirement. We will explore various practical requirements and intelligent use of tools like Weka, R, Octave, Hadoop, Mahout. Also we will compare advantages and limitations of these tools based on nature of ml algo(clustring, regression, etc.), type of data and modeling of the problem.

Speaker bio

Vivek Mehta is Senior Research Engineer at Flipkart and works on various ML related projects. Vivek have several years of experience in machine learning, statistical modeling, probability models, NLP and big data analytics. After completing MS from CMU, Vivek has worked at Read-Ink, PubMatic and TouchMagix before joining FlipKart. Vivek's experience spans across various domain like handwriting recognition, optimization of online ad-revenue and e-commerce.


  • Abhishek Shivkumar (@abhisheksgumadi) 7 years ago

    Is it possible for you to share your slides at please?

    Abhishek S

  • Sreedhar Nali (@sreedharnali) 6 years ago

    I am very much interested to know these Machine Learning tools Weka vs R vs Mahout. Can you please share the doc/slides . Thanks

Login to leave a comment