How the Internet Archive preserves petabytes of data

Jul 2012

23 Mon

24 Tue

25 Wed

26 Thu

27 Fri 09:30 AM – 05:30 PM IST

28 Sat 09:30 AM – 05:00 PM IST

29 Sun

Nimhans Convention Centre, Bangalore

All submissions

Previous Next

This submission has been added to the schedule

How the Internet Archive preserves petabytes of data

Submitted Jun 20, 2012

Section: Big Data Infrastructure & Processing Technical level: Beginner Session type: Lecture

Using Internet Archive as a case study, this talk presents aspects of big data in the context of long-term preservation.

Outline

The Internet Archive has been archiving the internet since 1996. It also archives and makes available a vast collection of data including films, audio and books.

The Internet Archive is one of the earliest organizations to work with petabytes of data. It built its own infrastructure to store, process and manage its data reliably, much before the cloud. Being an archive, preservation of data is the primary concern and it affects engineering decisions.

This talk is an introduction to the Internet Archive and its infrastructure.