The Fifth Elephant 2012

Finding the elephant in the data.

Make a submission

Submissions are closed for this project

Nimhans Convention Centre, Bangalore

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant - known as one the best #datascience and #machinelearning conference in Asia - is transitioning into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Anand Chitipothu


How the Internet Archive preserves petabytes of data

Submitted Jun 20, 2012

Using Internet Archive as a case study, this talk presents aspects of big data in the context of long-term preservation.


The Internet Archive has been archiving the internet since 1996. It also archives and makes available a vast collection of data including films, audio and books.

The Internet Archive is one of the earliest organizations to work with petabytes of data. It built its own infrastructure to store, process and manage its data reliably, much before the cloud. Being an archive, preservation of data is the primary concern and it affects engineering decisions.

This talk is an introduction to the Internet Archive and its infrastructure.

Speaker bio

This talk will be presented by Anand Chitipothu and Noufal Ibrahim. Both of them are employees of the Archive, working remotely from Bangalore.

Anand is a software consultant and trainer. He has been working with the Archive since 2007. He is co-ordinator of the PyCon India 2012 conference.

Noufal is a freelance trainer and consultant based out of Bangalore.
Founder of PyCon India and organiser of the first two conferences in India.




{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Supreet Sethi

Start with step 0 -> Approaching big data for small enterprises

Big data promises business gains. But a small and medium enterprise finds itself in dilemma of where to start. This session provides refreshingly simple steps from familiar territory of excel sheets and database tables to high rolling game of big data. more

21 Jun 2012