MapReduce and the "Art of Thinking Parallel"

Jul 2013

8 Mon

9 Tue

10 Wed

11 Thu 09:30 AM – 04:30 PM IST

12 Fri 10:15 AM – 05:30 PM IST

13 Sat 10:15 AM – 05:30 PM IST

14 Sun

Make a submission

Nimhans Convention Centre

Event format, themes and submission guidelines

The Fifth Elephant 2013 invites proposals on use cases and real-life examples. Tell us what specific problem you faced, which technology/tools worked for your use case and why, how you have developed business intelligence on the data you are collecting, and analytics tools and techniques you employ. Our preference is for showcasing original work with clear take-aways for the audience. Please emphasize these in your proposal.

The conference will have two parallel tracks on 12th and 13th July:

Storage: OLTP, messaging and notifications, databases and big data, NoSQL

Analytics: Metrics and tools, cloud computing, mathematical modelling and statistical analysis, visualization

This year we are adding a preliminary day of workshops, on 11th July, to provide attendees more in-depth, hands-on training on open source frameworks and tools (Pig, Hadoop, Hive, etc), commercial solutions (sponsored), programming languages such as R, and visualization techniques and tricks, among others.

We have a demo track for startups and companies who want to showcase their product to customers at The Fifth Elephant 2013 and get feedback. Slots are also open for 4-6 sponsored sessions for companies who want to talk about their technologies and reach out to developers, CTOs, CIOs and product managers at The Fifth Elephant. For more information on demo and sponsored session proposals, write to info@hasgeek.com.

HasGeek believes in open source as the foundation of the internet. Our aim is to strengthen these foundations for future generations. If your talk describes a codebase for developers to work with, we require that it is available under a license that does not impose itself on subsequent work. This is typically a permissive open source license (almost anything that is listed at opensource.org/licenses and is not GPL or AGPL), but restrictive and commercial licenses are also considered depending on how they affect the developer’s relationship with the user.

If you’d like to showcase commercial work that makes money for you, please consider supporting the event with a sponsorship.

Proposal selection process

Voting is open to attendees who have purchased event tickets. If there is a proposal you find notable, please vote for it and leave a comment to initiate discussions. Your vote will be reflected immediately, but will be counted towards selections only if you purchase a ticket. Proposals will also be evaluated by a program committee consisting of:

Emphasis will be placed on original work and talks which present new insights to the audience.

The programme committee will interview proposers who have received maximum votes from attendees and the committee. Proposers must submit presentation drafts as part of the selection process to ensure the talk is in line with the original proposal and to help the program committee build a coherent line-up for the event.

There is only one speaker per session. Attendance is free for selected speakers. HasGeek will cover your travel to and accommodation in Bangalore from anywhere in the world. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. If you are able to raise support for your trip, we will count that towards an event sponsorship.

If your proposal is not accepted, you can buy a ticket at the same rate as was available on the day you proposed. We’ll send you a code.

The program committee will announce the first round of selected proposals by end of April, a second round by end-May, and will finalize the schedule by 20th June. The funnel will close on 5th June. The event is on 11th-13th July 2013.

MapReduce and the "Art of Thinking Parallel"

Submitted Apr 23, 2013

Section: Analytics and Visualization Technical level: Advanced

The goal of the session is to take the audience from the “MECHANICS of using MapReduce” (to do simple slicing and dicing of BigData) to the “ART of using MapReduce” to solve more complex problems that at first glance look “unnatural” for MapReduce!

In this session we will:

Introduce MapReduce framework from scratch (if needed)
Highlight some limitations of MapReduce via types of parallelizations that cannot be done “naturally” in MapReduce (e.g. Joins)
Develop insights into how to transform such problems so they can be solved using MapReduce
Solve some “beautiful” problems in MapReduce (e.g. finding all maximal cliques in a graph)

The objective is to explore deeper insights in using MapReduce framework.

Outline

MapReduce is a ubiquitously used framework for largescale number crunching in BigData analytics. While it is quite general, it is not universal. There are a lot of analytics problems that cannot be ported to the MapReduce framework “naturally” (e.g. finding similarity between all pairs of documents in their Bag-of-Words representation).

In this talk, through a series of such problems, we will highlight both the limitations of MapReduce and how to overcome those limitations by being “smart” about “transforming those problems” to be more “amenable to MapReduce”.

As a concrete example we will develop an end-to-end solution in MapReduce for a very important and NP-hard Graph Theory problem - finding all Maximal Cliques in a graph.

Requirements

Basic understanding of MapReduce and Complexity of Algorithms would be helpful but not required.

Speaker bio

Dr. Shailesh Kumar is a Member of Technical Staff at Google, Hyderabad where he works on large scale data mining problems for various Google products. Prior to joining Google, he has worked as a Principal Dev. Manager at Microsoft (Bing) Hyderabad, Sr. Scientist at Yahoo! Labs Bangalore, and Principal Scientist at Fair Isaac Research in San Diego, USA.

Dr. Kumar has over fifteen years of experience in applying and innovating machine learning, statistical pattern recognition, and data mining algorithms to hard prediction problems in a wide variety of domains including information retrieval, web analytics, text mining, computer vision, retail data mining, risk and fraud analytics, remote sensing, and bioinformatics. He has published over 20 conference papers, journal papers, and book chapters and holds over a dozen patents in these areas.

He has two keen passions - first creating “magic from data” and second understanding functionally how the brain works!

Dr. Kumar received his PhD in Computer Engineering in 2000 (with a specialization in statistical pattern recognition and data mining) and Masters in Computer Science in 1997 (with a specialization in artificial intelligence and machine learning), both from the University of Texas at Austin, USA. He received his B.Tech. in Computer Science and Engineering from the Institute of Technology, Banaras Hindu University in 1995.

The Fifth Elephant 2013

Event format, themes and submission guidelines

Workshops

Product demos and sponsored sessions

Commitment to open source

Proposal selection process

Dates