The Fifth Elephant 2012

Finding the elephant in the data.

Anand Chitipothu


How the Internet Archive preserves petabytes of data

Submitted Jun 20, 2012

Using Internet Archive as a case study, this talk presents aspects of big data in the context of long-term preservation.


The Internet Archive has been archiving the internet since 1996. It also archives and makes available a vast collection of data including films, audio and books.

The Internet Archive is one of the earliest organizations to work with petabytes of data. It built its own infrastructure to store, process and manage its data reliably, much before the cloud. Being an archive, preservation of data is the primary concern and it affects engineering decisions.

This talk is an introduction to the Internet Archive and its infrastructure.

Speaker bio

This talk will be presented by Anand Chitipothu and Noufal Ibrahim. Both of them are employees of the Archive, working remotely from Bangalore.

Anand is a software consultant and trainer. He has been working with the Archive since 2007. He is co-ordinator of the PyCon India 2012 conference.

Noufal is a freelance trainer and consultant based out of Bangalore.
Founder of PyCon India and organiser of the first two conferences in India.



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning