Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.
The deadline for submitting a proposal is 15th June 2015
We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.
Track 1: Discovering Insights and Driving Decisions
This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:
- Statistical Visualizations
- Unsupervised Learning
- Supervised Learning
- Semi-Supervised Learning
- Active Learning
- Reinforcement Learning
- Monte-carlo techniques and probabilistic programming
- Deep Learning
Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.
Track 2: Speed at Scale
This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:
- Distributed and Parallel Computing
- Real Time Analytics and Stream Processing
- MapReduce and Graph Computing frameworks
- Kafka, Spark, Hadoop, MPI
- Stories of parallelizing sequential programs
- Cost/Security/Disaster Management of Data
Commitment to Open Source
HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.
If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.
Automating news discovery in real-time
The breaking news segment is an intensely competitive market with players from the TV, radio, online, mobile and print space competing for attention. The ability to discover trends early and “break” them is an edge.
This session talks through some of the techniques used in an ongoing media engagement to automatically source real-time news, to cluster them, to filter the relevant ones, and build a storyline around these.
The session will cover:
- How to source news in real time from social media (Twitter, Facebook, Google), online news media, financial markets and other sources
- How to filter these based on their relative level of news-worthiness
- How to cluster them based on similarity
- How to identify related news to build a story around the topic
The session will focus more on technique than technology. WHile I will be sharing code, you can inspect that later. The talk itself will be layman friendly.
But to get the most out of the code, you’d need:
- a working knowledge of REST APIs
- enough Python knowledge to build a scraper
- enough HTML/CSS/JS knowledge to build a Chrome plugin
- enough stats to understand k-means clustering
- whatever natural language processing you’ve learnt from a writing a few NLTK programs
Anand is the Chief Data Scientist at Gramener.com. He has advised and designed IT systems for media organizations such as the Times Group, the India Today Group, The Guardian, CNN-IBN, etc.
Anand and his team explore insights from data and communicates these as visual stories. Anand also builds the Gramener Visualisation Server – Gramener’s flagship product.
Anand has an MBA from IIM Bangalore and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at s-anand.net.