How the Internet Archive preserves petabytes of data
Submitted by Anand Chitipothu (@anandology) on Wednesday, 20 June 2012
Big Data Infrastructure & Processing
Using Internet Archive as a case study, this talk presents aspects of big data in the context of long-term preservation.
The Internet Archive is one of the earliest organizations to work with petabytes of data. It built its own infrastructure to store, process and manage its data reliably, much before the cloud. Being an archive, preservation of data is the primary concern and it affects engineering decisions.
This talk is an introduction to the Internet Archive and its infrastructure.
Anand is a software consultant and trainer. He has been working with the Archive since 2007. He is co-ordinator of the PyCon India 2012 conference.
Noufal is a freelance trainer and consultant based out of Bangalore. Founder of PyCon India and organiser of the first two conferences in India.