In 2013, commodity hardware and computing capacity for storing and processing large and small volumes of data are easily available on demand. The bigger issues pertain to questions of how to scale data processing, handle data diversity, manage infrastructure costs, decide which technologies work best for different contexts and problems, and build products from the insights and intelligence that the data is presenting to you.
The Fifth Elephant 2013 is a three-day workshop and conference on big data, storage and analytics, with product demos and hacker corners.
Event format, themes and submission guidelines
The Fifth Elephant 2013 invites proposals on use cases and real-life examples. Tell us what specific problem you faced, which technology/tools worked for your use case and why, how you have developed business intelligence on the data you are collecting, and analytics tools and techniques you employ. Our preference is for showcasing original work with clear take-aways for the audience. Please emphasize these in your proposal.
The conference will have two parallel tracks on 12th and 13th July:
- Storage: OLTP, messaging and notifications, databases and big data, NoSQL
- Analytics: Metrics and tools, cloud computing, mathematical modelling and statistical analysis, visualization
This year we are adding a preliminary day of workshops, on 11th July, to provide attendees more in-depth, hands-on training on open source frameworks and tools (Pig, Hadoop, Hive, etc), commercial solutions (sponsored), programming languages such as R, and visualization techniques and tricks, among others.
Product demos and sponsored sessions
We have a demo track for startups and companies who want to showcase their product to customers at The Fifth Elephant 2013 and get feedback. Slots are also open for 4-6 sponsored sessions for companies who want to talk about their technologies and reach out to developers, CTOs, CIOs and product managers at The Fifth Elephant. For more information on demo and sponsored session proposals, write to email@example.com.
Commitment to open source
HasGeek believes in open source as the foundation of the internet. Our aim is to strengthen these foundations for future generations. If your talk describes a codebase for developers to work with, we require that it is available under a license that does not impose itself on subsequent work. This is typically a permissive open source license (almost anything that is listed at opensource.org/licenses and is not GPL or AGPL), but restrictive and commercial licenses are also considered depending on how they affect the developer’s relationship with the user.
If you’d like to showcase commercial work that makes money for you, please consider supporting the event with a sponsorship.
Proposal selection process
Voting is open to attendees who have purchased event tickets. If there is a proposal you find notable, please vote for it and leave a comment to initiate discussions. Your vote will be reflected immediately, but will be counted towards selections only if you purchase a ticket. Proposals will also be evaluated by a program committee consisting of:
- Gopal Vijayraghavan, Hortonworks
- Govind Kanshi, Microsoft
- Joydeep Sen Sharma, Qubole
- Srinivasan Seshadri (Sesh), Boltell
Emphasis will be placed on original work and talks which present new insights to the audience.
The programme committee will interview proposers who have received maximum votes from attendees and the committee. Proposers must submit presentation drafts as part of the selection process to ensure the talk is in line with the original proposal and to help the program committee build a coherent line-up for the event.
There is only one speaker per session. Attendance is free for selected speakers. HasGeek will cover your travel to and accommodation in Bangalore from anywhere in the world. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. If you are able to raise support for your trip, we will count that towards an event sponsorship.
If your proposal is not accepted, you can buy a ticket at the same rate as was available on the day you proposed. We’ll send you a code.
Discounted tickets are available from http://fifthelephant.doattend.com/
The program committee will announce the first round of selected proposals by end of April, a second round by end-May, and will finalize the schedule by 20th June. The funnel will close on 5th June. The event is on 11th-13th July 2013.
Co-occurrence Analytics: A versatile framework for finding interesting needles in crazy haystacks!
In this session we will learn about a new way of thinking about data mining and big data analytics, “Co-occurrence Analytics” - a unified framework for mining latent insights in a wide variety of data of the form: “relationships between entities”. We will show how the framework can be used to discover…
Logical Product bundles in retail market basket data - a significant departure from the traditional frequent item-set mining,
Meaningful multi-word-units or Phrases in text data - a significant departure from the traditional n-gram based language models,
Semantic Concepts in tag networks - a significant departure from the traditional modularity based community detection algorithms, and
Hierarchy of visual Object in images - a significant departure from the traditional Bag-of-visual words to understand images.
Most data around us can be thought of as “things co-occurring with other things in certain contexts”. Whether it is products co-occurring with other products in retail market baskets, words occurring before or after other words in unstructured text, tags co-occurring with other tags in social tagging systems, people co-occurring with other people in various social networking scenarios, or objects occurring in various 2-D geometrical juxtapositions of other objects in images, etc.
While there have been silos of efforts in each research community - retail, text, social networking, and vision, etc. - in dealing with “their” data, there has been no unifying framework to tame such a wide variety of co-occurrence data systematically - a theme for this session.
We will present a simple, intuitive, yet a powerful co-occurrence analytics framework to deal with a wide variety of data of the form “things co-occurring with other things in some context”. After describing the framework we will demonstrate how to adapt and apply the core principles of the framework to a variety of large real-world datasets to find novel and actionable insights even in the presence of significant noise in the data.
What makes this approach attractive is that it is:
(1) Unsupervised: No cost of getting labeled data. Just point it to the data and crunch.
(2) Unbiased: No prior assumptions about data distributions, etc.
(3) High Precision: Generates very high quality insights.
(4) High Recall: Generates exhaustively many insights.
(5) Parameter Poor: Very few parameters to play with.
(6) Scaleable: Highly parallelizable in MapReduce sense.
While the session will use some basic concepts from probability theory, information theory, graph theory, visualization, and data mining, the session will be self contained and no prior background in any of these areas is assumed.
Dr. Shailesh Kumar is a Member of Technical Staff at Google, Hyderabad where he works on large scale data mining problems for various Google products. Prior to joining Google, he has worked as a Principal Dev. Manager at Microsoft (Bing) Hyderabad, Sr. Scientist at Yahoo! Labs Bangalore, and Principal Scientist at Fair Isaac Research in San Diego, USA.
Dr. Kumar has over fifteen years of experience in applying and innovating machine learning, statistical pattern recognition, and data mining algorithms to hard prediction problems in a wide variety of domains including information retrieval, web analytics, text mining, computer vision, retail data mining, risk and fraud analytics, remote sensing, and bioinformatics. He has published over 20 conference papers, journal papers, and book chapters and holds over a dozen patents in these areas.
He has two keen passions - first creating “magic from data” and second understanding functionally how the brain works!
Dr. Kumar received his PhD in Computer Engineering in 2000 (with a specialization in statistical pattern recognition and data mining) and Masters in Computer Science in 1997 (with a specialization in artificial intelligence and machine learning), both from the University of Texas at Austin, USA. He received his B.Tech. in Computer Science and Engineering from the Institute of Technology, Banaras Hindu University in 1995.
- LinkedIn: http://www.linkedin.com/in/shaileshk