The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

(Skip ahead to session proposals)

In 2013, commodity hardware and computing capacity for storing and processing large and small volumes of data are easily available on demand. The bigger issues pertain to questions of how to scale data processing, handle data diversity, manage infrastructure costs, decide which technologies work best for different contexts and problems, and build products from the insights and intelligence that the data is presenting to you.

The Fifth Elephant 2013 is a three-day workshop and conference on big data, storage and analytics, with product demos and hacker corners.

Event format, themes and submission guidelines

The Fifth Elephant 2013 invites proposals on use cases and real-life examples. Tell us what specific problem you faced, which technology/tools worked for your use case and why, how you have developed business intelligence on the data you are collecting, and analytics tools and techniques you employ. Our preference is for showcasing original work with clear take-aways for the audience. Please emphasize these in your proposal.

The conference will have two parallel tracks on 12th and 13th July:

  1. Storage: OLTP, messaging and notifications, databases and big data, NoSQL
  2. Analytics: Metrics and tools, cloud computing, mathematical modelling and statistical analysis, visualization


This year we are adding a preliminary day of workshops, on 11th July, to provide attendees more in-depth, hands-on training on open source frameworks and tools (Pig, Hadoop, Hive, etc), commercial solutions (sponsored), programming languages such as R, and visualization techniques and tricks, among others.

Product demos and sponsored sessions

We have a demo track for startups and companies who want to showcase their product to customers at The Fifth Elephant 2013 and get feedback. Slots are also open for 4-6 sponsored sessions for companies who want to talk about their technologies and reach out to developers, CTOs, CIOs and product managers at The Fifth Elephant. For more information on demo and sponsored session proposals, write to

Commitment to open source

HasGeek believes in open source as the foundation of the internet. Our aim is to strengthen these foundations for future generations. If your talk describes a codebase for developers to work with, we require that it is available under a license that does not impose itself on subsequent work. This is typically a permissive open source license (almost anything that is listed at and is not GPL or AGPL), but restrictive and commercial licenses are also considered depending on how they affect the developer’s relationship with the user.

If you’d like to showcase commercial work that makes money for you, please consider supporting the event with a sponsorship.

Proposal selection process

Voting is open to attendees who have purchased event tickets. If there is a proposal you find notable, please vote for it and leave a comment to initiate discussions. Your vote will be reflected immediately, but will be counted towards selections only if you purchase a ticket. Proposals will also be evaluated by a program committee consisting of:

Emphasis will be placed on original work and talks which present new insights to the audience.

The programme committee will interview proposers who have received maximum votes from attendees and the committee. Proposers must submit presentation drafts as part of the selection process to ensure the talk is in line with the original proposal and to help the program committee build a coherent line-up for the event.

There is only one speaker per session. Attendance is free for selected speakers. HasGeek will cover your travel to and accommodation in Bangalore from anywhere in the world. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. If you are able to raise support for your trip, we will count that towards an event sponsorship.

If your proposal is not accepted, you can buy a ticket at the same rate as was available on the day you proposed. We’ll send you a code.

Discounted tickets are available from


The program committee will announce the first round of selected proposals by end of April, a second round by end-May, and will finalize the schedule by 20th June. The funnel will close on 5th June. The event is on 11th-13th July 2013.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Bharath Mohan


What is Multi-Stream Retrieval?

Submitted Apr 30, 2013

Multi-stream retrieval is about humans querying, exploring and discovering from streams of information.

Information Retrieval systems have been evolving to human needs and the data that is being spewed out. The nature of the web is evolving away from the public web towards a personal and a social web - where we hang out with things closer to us. The way we interact with this web is different. The systems we need to build to explore this closer web also needs to be different.

The talk outlays how multi-stream retrieval is different from traditional models of information retrieval - like the traditional flat web indexes, and brings to light the challenges and opportunities of interacting with streams.


The ways you interact with the web have changed. The full web is accessed through large directories, or a search engine. Your web is more of a stream – a stream of life events that happen to people and things you care about. Streams are aggregated and presented to you. The same systems that powered your access to the full web are not good enough when you interact with “your web”. Your web is not just documents. Your web is organized around people, time, context and thoughts. They keep happening. How should you be interacting with “your web”?

What do I even mean by “interacting with your web”?

Let me ask you a few questions, that are all pertinent when you interact with “your web”.

  1. Tell me all the places visited by my friends this year.

  2. Look at Sony’s Android product line, and then Samsung’s Android product line. Whats common in their feature evolution, and whats different?

  3. What are the top interests among people who work at Intel?

  4. What industries and sectors get the most attention from the VCs of India?

  5. Whats similar between what I read yesterday, and what I am reading today?

  6. Now that I am looking at this resume, what have I read in the recent past that may be interesting to him?

Do you get the idea? The nature of these searches are very different. They are all working on data that is very “close” to you. The objects of these questions are of interest to you. They all have their streams – and they are all emitting out data into the wild. You want to gather interesting aspects of these streams – that is not as straight forward as just searching on the web for some document. The results often lie across documents, and across streams!

This is multi-stream retrieval. For the uninitiated, Multi-stream retrieval is about humans querying, exploring and discovering from streams of information.

Speaker bio

My linkedin profile:

I am an information dissemination enthusiast, having worked for more than 10 years studying how information spreads, the channels it uses and the way it is explored and absorbed. I got a Phd in this topic - mining nurturers in evolving social networks, and later on worked at Google, as part of Google News ranking news sources for authoritative reporting.

At Insieve, a startup I co-founded, we work on multi-stream retrieval - the ability to explore, analyse and discover from the streams that are dear to you. Whether you are an information geek, or a marketer, our systems help you gather insights from the living streams of data.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more