The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from fifthelephant.doattend.com.

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Regunath Balasubramanian

@regunathb

Aadhaar - world's largest biometric identity platform (200 trillion biometric matches per day, 2 PB of data)

Submitted Jul 6, 2012

Describe the technology needs and solutions behind Aadhaar - the world’s largest biometric identity platform.

Explain the use of Big Data to implement:

  • 200 trillion biometric matches per day
  • 2 Peta Byte of raw data stored
  • 100 million authentication requests per day
  • Tera-byte scale data warehouse of 200 million records
  • 50 million messages per day
  • 100 million database transactions per day

Outline

Aadhaar has unique compute and data challenges that exhibit all characteristics of Big Data - Volume, Variety and Velocity. The challenge is to derive Value from these attributes.

A number of technologies have been used to handle massive parallel processing, streaming data reads, data locality computing, low latency reads, data integrity and challenges of dealing with distributed data - best explained by the CAP theorem.

Big Data solutions are used in building both OLTP and Analytics systems in Aadhaar.

Get to know how the following technologies are used:

  • Hadoop stack : HDFS, HBase, Hive, Pig, Zookeeper
  • MySQL : sharded, partitioned, distributed
  • SEDA : Mule, RabbitMQ
  • Search : MongoDB, sharded Solr
  • Compute Grid : Spring, GridGain
  • Monitoring : Custom built, Nagios
  • Analytics & Visualization
  • Deployment footprint : Thousands of CPU cores
  • Extensive Data archival, DR

Requirements

An appreciation of challenges involved in building a biometric database of 1.2 billion people, support for multi-lingual applications, deployment challenges of reaching out to every village and city in the country involving 27,000 installations till date and logistics required to manage enrolments, letter delivery, on-line authentication and financial transactions in the order of millions.

Speaker bio

Dr. Pramod Varma is currently Chief Architect at UIDAI. He joined UIDAI in 2009 and has been pivotal in ensuring an open, scalable, and secure architecture is built to meet the needs of Aadhaar project. He leads the overall technology and application architecture and application development within UIDAI Technology Unit and is based in Bangalore.

Before joining UIDAI in July 2009, he was the Chief Technology Architect and Vice President of Research at Sterling Commerce, now part of IBM. He joined Sterling in 2005 when Sterling Commerce acquired Yantra Corporation, a leading supply chain software company based in Boston. He was one of the founders of Yantra and was the Vice President of Technology.
Pramod holds a Masters and Ph.D. degree in Computer Science along with a second Masters in Applied Mathematics.

Regunath is Principal Architect of Aadhaar and Chief Architect at MindTree. He has created IP based solutions on SOA. He is passionate about Open Source and committer on the MindTree Insight project. Regunath has been an invited speaker on forums like OSI Days and iCMG architecture summit. He is a guest columnist at CIOUpdate and blogs frequently on technology subjects.

Slides

http://www.slideshare.net/regunathbalasubramanian/aadhaar-at-5thelephantv3

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more