The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from fifthelephant.doattend.com.

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Ramesh Hariharan

Your Genome on the Cloud: Big Data Challenges in Personalized Medicine

Submitted Jun 12, 2012

We are at the threshold of a major revolution in health care: thanks to two decades of explosive research in tools and techniques that interrogate living cells at the molecular level, doctors will soon have an invaluable tool added to their arsenal to help diagnose and cure disease, i.e., the genome of the patient. Several success stories have already emerged, for instance, a little boy who needed several futile operations before sequencing his genome indicated a defect in the immune system, which was then solved with a bone-marrow transplant.

The genome and its associated paraphernalia is quite large and that naturally calls for Big Data techniques to manage and deliver genomic information to clinicians, consumers, and researchers. To just give you a feel, sequencing machines generate upwards of 150GB of compressed data for a single individual and analysing this data is equivalent to sifting through 30 finely shredded copies of a 200,000 page telephone directory!

The next few years will see the translation of all the above from research lab to hospital and impact all our lives eventually. The goal of this session will be to introduce attendees to this area and share the excitement that the next few years hold in store.

Outline

The session will have two parts.

The first part will describe the evolution of genomic measurement over the last two decades, survey the current state, describe how ever reducing costs and increased understanding are leading to significant impact on disease diagnosis and cure, discuss how the world will look in the next 5 years when large numbers of people have their genomes sequenced.

This will lead to the second part where we will describe the Big Data techniques and challenges in handling large volumes of genomic data: what computations need to be run, what queries need to be handled, how data needs to flow from site of generation to site of consumption etc. These will include clever ways for text indexing, fast string matching algorithms, use of special hardware paradigms (SIMD/GPUs), Hadoop based pipelines to large volume processing, visualization methods etc.

Requirements

NA

Speaker bio

I am a Computer Scientist/Entrepreneur, a founder of Strand Life Sciences and an Adjunct Faculty member at the Indian Institute of Science. My work spans algorithmic problems from a research perspective, building handheld devices (the Simputer) and speech synthesis systems (Dhvani), and various software platforms for biological data processing (GeneSpring, Avadis NGS) as part of Strand Life Sciences.

Slides

http://hariharan-ramesh.com/talks/emtech.pptx

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more