Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.
The deadline for submitting a proposal is 15th June 2015
We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.
Track 1: Discovering Insights and Driving Decisions
This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:
- Statistical Visualizations
- Unsupervised Learning
- Supervised Learning
- Semi-Supervised Learning
- Active Learning
- Reinforcement Learning
- Monte-carlo techniques and probabilistic programming
- Deep Learning
Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.
Track 2: Speed at Scale
This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:
- Distributed and Parallel Computing
- Real Time Analytics and Stream Processing
- MapReduce and Graph Computing frameworks
- Kafka, Spark, Hadoop, MPI
- Stories of parallelizing sequential programs
- Cost/Security/Disaster Management of Data
Commitment to Open Source
HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.
If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.
Recommendation System beyond traditional Collaborative filtering
I would be sharing my thoughts and experiences at Snapdeal in building more personalized and relevant recommendation system for e-commerce industry by presenting mathematical, technological, machine learning and various other aspects related to it.
Though Collaborative filtering works quite well for companies like NetFlix but here in Snapdeal we are catering 12M huge product catalog and more than 100 categories which again comprised of 20-30 subcategories each. For us only Collaborative filtering doesn’t work well, because of the wide catalog and implicit feedback capturing instead of explicit and hence we developed a recommendation system which considers various other factors beyond collabarative Filtering.
In this session I would be discussing other factors (mentioned below) and their mathematical models that we have considered while building custom recommendation system for generating more personalized and relevant recommendations.
- User Category Affinity (to some more granular level)
- Content based product similarity
- product which goes well with already bought products.
- predicting the repurchase of already purchased products.
- Suggesting trending products based on user’s affinity.
- Capturing user’s feedback (implicit) to our served recommendations and use to improve relevancy.
- Collaborative filtering (we have also used this but with some weight-age)
Finally I would be concluding session with technical challenges in building scalable recommendation system with massive datasets and serving these recommendations in realtime.
Gagan Deep Juneja is a Lead Engineer at Snapdeal and is leading several initiatives related to user personalization. He has close to 7 years of experience in the Software Industry. He has worked on several projects using Java/J2ee and Hadoop as the primary technology. He has been working with Big data technologies such as hadoop, spark, cascading, pig, hive, blur for the past couple of years. He has great interest in Machine learning concepts and working with supervised and unsupervised algorithms to get value out of data. He has an inclination to open source technologies and likes to delve into new frameworks. He is a committer and PPMC member to Apache Blur (incubating). He has spoken at various meetup groups in the past. He is a active blogger, and in his leisure time loves exploring new technologies and keeping himself updated with latest trends.