The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

All about data science and machine learning

Vivek Mehta


Survey of Machine Learning tools as the data grows (Weka, R, Octave, Mahout)

Submitted Jun 25, 2012

To review the available machine learning tools relevant for different size and scale of data.


At different stages of organization growth, size and scale of data is different and so is the tool needed for doing machine learning(ML). Its not necessary to build huge team and GBs of data for ML tricks to be useful and relevant. ML can be applied using small amount of data with appropriate tools at early stage of organization. And as the org grows and so is the data size, one needs to change the tools required. One need to look at distributed ML system.

In this talk we will explore, with examples, use of specific tool based on the requirement. We will explore various practical requirements and intelligent use of tools like Weka, R, Octave, Hadoop, Mahout. Also we will compare advantages and limitations of these tools based on nature of ml algo(clustring, regression, etc.), type of data and modeling of the problem.

Speaker bio

Vivek Mehta is Senior Research Engineer at Flipkart and works on various ML related projects. Vivek have several years of experience in machine learning, statistical modeling, probability models, NLP and big data analytics. After completing MS from CMU, Vivek has worked at Read-Ink, PubMatic and TouchMagix before joining FlipKart. Vivek’s experience spans across various domain like handwriting recognition, optimization of online ad-revenue and e-commerce.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning