The Fifth Elephant 2015
A conference on data, machine learning, and distributed and parallel computing
Jul 2015
13 Mon
14 Tue
15 Wed
16 Thu 08:30 AM – 06:35 PM IST
17 Fri 08:30 AM – 06:30 PM IST
18 Sat 09:00 AM – 06:30 PM IST
19 Sun
Accepting submissions
Not accepting submissions
Tackling ML's black boxes with probabilistic programmingWhile machine learning has become a wildly popular solution for analyzing a lot of problems, it’s also ended up becoming a major black box. The objective of this talk is to showcase probabilistic programming as a feasible alternative in such scenarios. more
Section: Full Talk
Technical level: Advanced
|
Networks and Network AnalysisThis talk will cover various issues related to Networks and the ways to leverage Social Network Analysis techniques to gather inferences and insights from them. more
Section: Full Talk
Technical level: Advanced
|
Big data analysis with Apache SparkApache Spark is a new upcoming big data processing engine. It’s getting popular for it’s of ease of use and it’s unification of different big data work load. The objective this workshop is to get your hands dirty with it. more
Section: Workshop
Technical level: Beginner
|
Anatomy of RDD : A Deep dive into Spark RDD Data structure.RDD is the core abstraction of Apache Spark. So understanding RDD in depth is very crucial to use spark very effectively. This talks aims to take audience a deep dive into RDD to make them understand why it’s so powerful. more
Section: Full Talk
Technical level: Advanced
|
Scrap Your MapReduce - Introduction to Apache SparkIntroduction to Apache Spark, compare and contrast it with MapReduce programming model, see what Apache Spark has to offer, where it shines, how to use it via real life examples. more
Section: Full Talk
Technical level: Beginner
|
Deprecating MapReduce Patterns with Apache SparkLive coding demostration to show how Apache Spark can solve non trivial problems, and hence deprecating some of the established patterns of MapReduce, with consice code, giving us significant performance gain, and developer friendly programming model also keeping other sweetness of MapReduce wolrd intact. more
Section: Full Talk
Technical level: Intermediate
|
Instrumenting your kafka & storm pipelinetips to design your stream processing setup. what all can go wrong, how to instrument it. more
Section: Full Talk
Technical level: Intermediate
|
Two Years Wiser: The Nilenso ExperimentAttendees will hear how nilenso has overcome a series of challenges present in running a technology co-operative. This story will be informative for anyone who wants their team to be more involved, not just for employee-owned companies. Understanding decision-making, execution, and delivery is essential for any business. By describing the structural and procedural challenges we’ve faced over the … more
Section: Full Talk
Technical level: Beginner
|
Building Data Products for Small / Mid-Sized DataUnderstand the process I and Kevin Gates went through in building www.seeingtheair.com, a hackathon data product to compare Air Quality in various cities. Audience will have an appreciation for - Data Extraction, Exploration phase along with building an Web App and some intuition for Data Viz. I intend to show Python Code behind the app in this talk. more
Section: Full Talk
Technical level: Intermediate
|
Introduction to Deep LearningIn fields like computer vision, speech recognition and natural language processing, deep learning has produced state-of-art results. And they are showing lot of promise in other fields too. more
Section: Workshop
Technical level: Intermediate
|
Visualising Multi Dimensional DataTo understand techniques to effectively visualise multi dimensional data to aid exploratory data analysis. more
Section: Full Talk
Technical level: Intermediate
|
Building Recommender systemWill talk about classical and state-of-the-art recommender systems. The audience will also get a flavour of the mathematical computations that go into recommender systems. more
Section: Crisp Talk
Technical level: Beginner
|
On building a cloud-based black-box predictive modeling systemData Analytics platforms, with predictive models at their core, are the buzzword in Enterprise Analytics. Having been on both sides - a consultant providing analytics and a consumer of analytics, I’ve realized that there are few, if any, runaway winners. Rightly so. It is one of the hottest growth areas. This talk would go over some of the ingredients to building a successful data analytics platf… more
Section: Full Talk
Technical level: Beginner
|
Big Data BenchmarkingParticipants will get the knowledge of benchmaring techniques for big data more
Section: Full Talk
Technical level: Intermediate
|
Processing large data with Apache SparkOverview of Apache Spark functionalities with detailed architecture details. We will touch upon Spark Streaming capability for near real time processing. more
Section: Full Talk
Technical level: Intermediate
|
Understanding supervised machine learning hands on!If you have ever been in a “black box” operating mode where you are throwing more data/complex models at a machine learning problem without a clue about why it is working or not working, this workshop is for you! The workshop will primarily focus on understanding supervised machine learning. more
Section: Workshop
Technical level: Beginner
|
Building Spark as Service in Cloud using YARNApache Spark is rapidly taking off in popularity as a new data processing framework. However - it can be daunting to install and run it. In this talk we will talk about the challenges of running Spark in the Cloud using YARN and how we have built Spark as a Service. We will also discuss about our learnings from building and operating this service in the AWS cloud and future directions. more
Section: Full Talk
Technical level: Intermediate
|
Securing your Enterprise Hadoop ClusterHadoop was originally developed for crawling the Internet and indexing - where security is not a concern. But we have come a long way since then. Major banks and organizations are adopting Hadoop as their preferred Big Data platform and there is a growing emphasis on securing the Data and the Cluster components/resources. In a complicated, distributed system like Hadoop, there are several attack … more
Section: Full Talk
Technical level: Intermediate
|
Critical pipe fittings: What every data pipeline requiresThe talk aims to provide data builders key aspects that will help them build their own frameworks and tools to add some transparency to their data pipeline and ship faster. more
Section: Full Talk
Technical level: Intermediate
|
Making a contextual recommendation engine using Python and Deep Learning at ParallelDotsParallelDots ( paralleldots.com ) is a recommendation engine for publishers to increase engagement/monetization on their websites. For the end user, it solves the problem of information overload by providing set of relevant stories and history about whatever he/she is reading. ParallelDots provides a set of recommendation engines which include the most accurate related posts widget, automated tim… more
Section: Crisp Talk
Technical level: Beginner
|
A review of important results in distributed systemsThe key objective is to get the attendee a more nuanced appreciation of the constraints placed while designing distributed/fault tolerant systems. At the end of this talk, the attendee should be conversant with some of the key theorms, ideas and common solutions to distributed problems. more
Section: Full Talk
Technical level: Intermediate
|
Leveraging Cloud for BigData Analytics - Patterns, Options and Practical Next StepsThis talk will cover in-depth about leveraging public clouds for Big data analytics. It will also describe the next steps to get you started on your cloud based big data analytics initiative. more
Section: Full Talk
Technical level: Intermediate
|
Squirrel – Enabling Accessible Analytics for AllSimplify and widen the scope of the Software Developer to create smart tools that enable easy access and actionable insights for all. more
Section: Crisp Talk
Technical level: Intermediate
|
Anomaly Detection Using Apache Sparkwalk through how we used Sparks scalable KMeans algorithm to detect Anomalies for our Cyber Analytics platform more
Section: Crisp Talk
Technical level: Advanced
|
High Performance Computing in RThis is a hands-on workshop focused on the high performance aspects of R programming. The attendees would get to learn how to identify the performance issues and address them through the use of various R packages. This workshop is targeted towards audience with a basic familiarity in R. more
Section: Workshop
Technical level: Intermediate
|
HawkEye: A Real-Time Anomaly Detection SystemIn this talk, I will present the details of the HawkEye system with insights on selection of algorithms and parameter tuning. I intend to share our mistakes and learnings while deveoloping HawkEye. more
Section: Crisp Talk
Technical level: Beginner
|
IT Operations Analytics: Using Text Analytics and Statistical Modeling in IT Operations DataAttendees will be exposed to the emerging area of IT Operations Analytics. Attendees will learn how text mining and statistical modeling techniques can be used to extract insights out of IT Operations Data. more
Section: Full Talk
Technical level: Intermediate
|
Building tiered data stores using Aesop to bridge SQL and NoSQL systemsUnderstand how to build and use tiered data stores with Aesop using best-in-class SQL and NoSQL systems. Also relate to a number of real world requirements where this technology and patterns can be applied, while scaling to millions of data records. more
Section: Full Talk
Technical level: Intermediate
|
Search at Petabyte scaleLearnings around how did we scale and run our search infrastructure in a SaaS world, which crunches 25+ PB data everyday. more
Section: Crisp Talk
Technical level: Intermediate
|
Running natural language queries against NoSQL schemaDemonstrate and discuss advanced text parsing and processing techniques on UNSTRUCTURED DATA more
Section: Crisp Talk
Technical level: Advanced
|
Recommendation System beyond traditional Collaborative filteringI would be sharing my thoughts and experiences at Snapdeal in building more personalized and relevant recommendation system for e-commerce industry by presenting mathematical, technological, machine learning and various other aspects related to it. more
Section: Full Talk
Technical level: Intermediate
|
Escher - democratizing beautiful visualizationsI aim to introduce the audience to Escher.jl - a new tool for web-based interactive visualizations wholly programmable in a single, data-friendly, fast, lanugage - Julia. Hopefully, the pleasant ergonomics of the library will encourage data scientists to create more explorable, beautiful, and insightful presentations of data, and also create user interfaces without an army of front-end developers. more
Section: Crisp Talk
Technical level: Beginner
|
Aerospike : High Performance NoSQL store with flash optimizationHigh Performance databases are need of most widely used real-time internet services. Low latency and high throughput has always been of utmost importance in bringing traffic to the site. Aerospike is one such noSql store designed to maintain under 1 millisecond response time even under peak load with billions of records spanning over tera bytes in size. Optimized for flash storage, aerospike can … more
Section: Full Talk
Technical level: Intermediate
|
Ensemble LearningTo understand most basic and convenient approaches of ensembling more
Section: Full Talk
Technical level: Beginner
|
Benchmarks from JVM to Big DataExplain about various benchmarks related to JVM and Big Data more
Section: Full Talk
Technical level: Intermediate
|
Big Data Engineering made easySwitching the database for scaling up and then porting all the algorithms / reporting functionalities that had been implemented to the new database is a challenge. At Sokrati we have eased this pain by implementing proprietery APIs (for internal use). more
Section: Full Talk
Technical level: Intermediate
|
The many ways of parallel computing with JuliaIntroduce Julia for those who haven’t heard about it, and focus on parallel computing with Julia. I will try to do some fun stuff with a 1000 processors in a demo. more
Section: Full Talk
Technical level: Beginner
|
The many ways of parallel computing with JuliaIntroduce Julia for those who haven’t heard about it, and focus on parallel computing with Julia. Do some demos with hundreds of processors. The audience will get a feel for parallel computing with Julia and is strictly advised to “Try it at home.” more
Section: Full Talk
Technical level: Beginner
|
Deconstructing Linear RegressionThis short talk aims to “deconstruct” Linear Regression and explain the steps done by the library functions before throwing out the intercept and slope. more
Section: Crisp Talk
Technical level: Beginner
|
POC: How to slice, dice & search billions of users events in seconds (from scratch)results from a proof of concept business intelligence tool, where each bit in a multi-billion bitmap, represented a user performing an event. a minimal 100 LOC implementation gave encouraging results, and also areas that could improve - caveats, ideas to roll out your own BI tool. more
Section: Crisp Talk
Technical level: Beginner
|
CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CACAP Theorem is everywhere: "Consistency, Availability, Partition tolerance — choose any two!” But it is oversimplified and misunderstood more often than not. CAP’s consistency isn’t what most people think it is; CAP’s availability isn’t what most people think it is; what does partition-tolerance even mean? more
Section: Full Talk
Technical level: Intermediate
|
Static & Interactive Exploratory Data Analysis in RLearn to quickly do static and interactive visual exploration of large datasets in R more
Section: Workshop
Technical level: Intermediate
|
Approximate algorithms for summarizing streaming dataIntroduce two approximate algorithms which are considered cornerstone of big data infrastructure. more
Section: Full Talk
Technical level: Intermediate
|
Apache Tez - Present and FutureTalk about the present and future of Apache Tez. Outline more
Section: Full Talk
Technical level: Intermediate
|
Automating news discovery in real-timeThe breaking news segment is an intensely competitive market with players from the TV, radio, online, mobile and print space competing for attention. The ability to discover trends early and “break” them is an edge. more
Section: Full Talk
Technical level: Beginner
|
Using Modes for Time Series ClassificationTo present methods for time series analysis other than ARIMA etc. more
Section: Crisp Talk
Technical level: Beginner
|
Getting Started with IoTHow IoT solution can be delivered with ease Different options available for building IoT solution Understanding of solution architecture more
Section: Full Talk
Technical level: Intermediate
|
Anatomy of Decision Trees using an example from KaggleDecision trees are amongst the most popular predictive modelling techniques in the analytics industry. Attendees will learn how to effectively apply decision trees to predict survival on the Titanic: Machine Learning from Disaster problem in Kaggle. more
Section: Full Talk
Technical level: Intermediate
|
Building Real time solution within 30 minutesUnderstanding the feature available to build the solution within 30 minutes. Ease of the technology. How to buid realtime solution in less time even if you are not an hard core developer. more
Section: Crisp Talk
Technical level: Beginner
|
Are these the same pair of shoes? - Matching retail products at scaleMatching identical products from different retail websites is one of the hardest and the most impactful problems in the space of product intelligence. This talk will cover the breadth of algorithms and models we use for matching products across customer catalogs. It will also cover some practical aspects of taking these algorithms and models to production. more
Section: Full Talk
Technical level: Intermediate
|
An Integrated Weblog Processing and Machine Learning Workflow for Building and Deploying Intent Prediction Models at ScaleTo share with the audience our experiences in setting up a scalable infrastructure for weblog processing and machine learning leveraging several technologies such as Hadoop, Vertica, R and Python. The talk will focus on implementing scalable data models for dynamic intent predictions on web/mobile channels and machine learning best practices. more
Section: Full Talk
Technical level: Intermediate
|
Practical Approach to Python based Supervised Machine Learning: User Generated Text Classification TechniquesIn e-Commerce, we handle large volume of user genearted content in the forms of Reviews, Ratings, Question/Answer, Chat etc. These user generated content has lot of values in terms of taking right organization-wide business decission. This large volume of user generated text also imposes problem of classificaiton and moderation because the data is mostly unstructured. Combination of various Machi… more
Section: Full Talk
Technical level: Intermediate
|
Building a E-commerce search engine: Challenges, insights and approachesThe objective of the talk is to motivate the problems and challenges of e-commerce search and provides insights and approaches on how one can go about building a world class product search engine. more
Section: Sponsored
Technical level: Beginner
|
postgres clusters and their nuancesWe built a postgres cluster using repmgr to serve 2k requests per second, and store 5G of data per day. You’ll learn about postgres’ WAL replication and archival, how repmgr works, how we leveraged it for our needs, hooked it up to our application, and built multiple lines of defence in case something bad happens. And oh, we’ll also compare it with RDS for good measure. more
Section: Full Talk
Technical level: Intermediate
|
Revolutionizing travel with ML & Analytics – An insight into business optimization using Machine Learning and Advanced AnalyticsAt Orbitz, Big Data technologies have helped transform the way we let people travel. In this talk we elaborate on how we at Orbitz have leveraged intelligence derived from more than 2 PB of semi-structured and unstructured data to optimize various facets of our business such as content optimization, search personalization and channel optimization. more
Section: Full Talk
Technical level: Intermediate
|
Hardware Accelerated Big Data ProcessingExpect attendees to obtain: Clear understanding of FPGAs (Field Programmable Gate Arrays), and their pros/cons over software on microprocessors for big data more
Section: Crisp Talk
Technical level: Intermediate
|
Solr compute cloud - An elastic Solr infrastructureGo over various challenges in scaling solr search platform to serve hundreds of millions of documents with low latencies and high throughput in a multi tenant architecture. more
Section: Full Talk
Technical level: Advanced
|
Joining data streams at scale for fun and profitUnderstand how to derive more value out of real-time data streams by joining them using a stream processing system to derive deeper insights. We’ll walk through our experience of building a platform for such use-cases at Flipkart, and describe the design patterns we have evolved within it; we have scaled this platform to process billions of events a day across hundreds of streaming data applicati… more
Section: Crisp Talk
Technical level: Beginner
|
Developing a Hybrid Recommender System for Some of Life’s Most Important ChoicesRecommender Systems are both an old and an active area of research. Advances in Recommender Systems can emerge from developing applications in new contexts and for new use cases. In this session we will describe the unique challenges associated with building a recommender system for real estate and we will present the work we are doing to develop a hybrid recommender system for real estate at Hou… more
Section: Full Talk
Technical level: Intermediate
|
Data Infrastructure for Real Time Analysis of User Click Stream DataIndia is churning out a large number of service oriented startups by the day. They need to build customized views for users based on those users’ previous sessions and interactions with the product. Most startups can’t afford to design, build and maintain a custom Data Analytics Pipeline let alone do real-time data analysis and refine user interactions with the product. Most startups have few dev… more
Section: Full Talk
Technical level: Beginner
|
Designing distributed components in a multi tenant architectureThe objective of this talk is to go over design of distributed search components in a Multi-tenant architecture spanning across geographies and deals with challenges around custom ranking, tenant specific configurations and dynamic ranking elements. more
Section: Full Talk
Technical level: Intermediate
|
Deep Learning for Natural Language ProcessingThis talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart more
Section: Full Talk
Technical level: Intermediate
|
Map Tile ServerProblem Statement Our previous maps at CommonFloor used JavaScript to show listings on google maps. The user experience would break in high density, zoom/pan, application of filters. To solve the problem, we have overlaid our own tiles (transparent PNG images of size 256 x 256) on google map. These tiles are generated from backend and improve the maps experience significantly by making the whole … more
Section: Crisp Talk
Technical level: Intermediate
|
Stream Processing in production: Metrics that matterUnderstand what are some useful metrics to monitor the health of stream processing jobs (such as Apache Storm topologies) when they are deployed in production. Also get some ideas on how to capture these metrics (including suggestions for libraries & tools), and how to proactively mitigate the problems from escalating. more
Section: Crisp Talk
Technical level: Intermediate
|
Keeping Moore's law alive: Neuromorphic computingThis talk explores the implications of Neuromorphic Engineering, or ‘building brains in silicon’, on the development of extremely parallel compute techniques such as deep learning. more
Section: Full Talk
Technical level: Beginner
|
Exploratory data analysis using Apache Lens and Apache ZeppelinApache lens is an analytics platform that aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one. Zeppelin is a web based notebook that enables interactive data analytics. more
Section: Crisp Talk
Technical level: Intermediate
|
Holistic Security Process for Humanitarian ProjectsHumanitarian projects usually contain sensitive information and are more prone to risks. Hence it is important to include security holistically in project planning. The objective of the session would be to present good data security practises to be followed while working on a humanitarian project. more
Section: Full Talk
Technical level: Intermediate
|
Harnessing the power of the Erlang VM at HousingRoR and Django has ensured we remain productive in the face of rapidly changing product requirements at Housing. However we ran into issues of memory and speed when we had to scale throughput and interface with other services in our SOA. This talk describes how we rewrote some core parts of our infrastructure to ride on the coattails of the awesome Erlang VM. more
Section: Crisp Talk
Technical level: Intermediate
|
Graph Algorithms and Computer VisionDiscover some of the interesting connections between various sub-areas of Machine Learning, Analytics and Computer Vision. Specifically how Random Walk on a Graph can give clustering of data, how clustering can help in Segmentation (image/video) of data and how many of these can boil down to eigen decomposition of a specially crafted matrix of graph data. more
Section: Full Talk
Technical level: Intermediate
|
How to stop admiring and start using Deep LearningDeep Learning results looks very fascinating but it seems to require a huge infra to start using it. In this talk, we present how to approach it in incremental manner to make real use of Deep Learning. more
Section: Full Talk
Technical level: Intermediate
|
Scalable real-time personalized recommendation systemThis talk goes over some challenges in scaling a real time personalized recommendation system that can dynamically adapt to user actions and incorporate these signals into various applications like search, recommendations, predictive suggestions etc. more
Section: Full Talk
Technical level: Intermediate
|
From Search to Discovery at HousingThe objective of this session is to introduce a framework and models for search recommendations through real-time user click stream analysis. We will be talking about various architectural challenges and challenges in modeling the expert system and how it can be used in different domains. more
Section: Full Talk
Technical level: Beginner
|
Call me maybe: Jepsen and flaky networksTell people that network partitions happen often enough that it is worth caring about how their distributed data stores respond in such situations more
Section: Full Talk
Technical level: Advanced
|
Dead Simple Scalability PatternsEveryone dreams of being ‘Web Scale’, but we start out small. We — most of us — don’t launch a service and expect it to serve millions of requests from Day 1. This means that we don’t think about the ways in which our stack will blow up when the number of requests does start climbing. This talk lists simple patterns and checks that Development and Operations teams should implement from Day 1 in o… more
Section: Crisp Talk
Technical level: Beginner
|
Building a distributed cache system with redis, clojure and mathLearn how consistent hashing, CRDTs and Clojure protocols can be used to build a distributed cache. more
Section: Full Talk
Technical level: Intermediate
|
AB testing: What, Why & HowUnderstand, what AB testing is? why is it a great tool? how to experiment correctly? and learnings. more
Section: Full Talk
Technical level: Beginner
|
When Apache ZooKeeper is good fitThis talk focus on fitment of ZooKeeper for various use cases. Attendees will learn how effectively use ZooKeeper in the distributed clusters. more
Section: Crisp Talk
Technical level: Intermediate
|
Introduction to MaelStorm and Performance EngineeringLearn to build highly performant and scalable backend using java more
Section: Workshop
Technical level: Advanced
|
Data Comes in ShapesData comes in shapes. The study of shape is geometry, in as many dimensions as you have variables. You can’t visualise them all, but you can see in 2D and 3D why the algebraic tools work the way they do more
Section: Keynote
Technical level: Beginner
|
Real Time Bid Modification @ Million Requests per second...Learning around building high performance software systems a capable of handling million requests per second while keeping response time under 10 ms. more
Section: Crisp Talk
Technical level: Intermediate
|
Deploying Batch and Streaming Architectures on AWSTo learn about the key Big Data and Analytics services on AWS and how they can be used for both batch and streaming workloads. more
Section: Sponsored
Technical level: Intermediate
|
Igniting your data with Apache SparkIntroduce the audience to Spark and it’s API with hands on exercise. The workshop will also deal with deploying and configuring Spark. Finally the workshop will lead into building data applications on top of spark and some lessons from Shopify. more
Section: Workshop
Technical level: Beginner
|
Future patterns in data ecosystemUnderstand emerging patterns of data consumption and processing to devise better data systems. more
Section: Sponsored Keynote
Technical level: Intermediate
|
"Thinking Machines"To explore the key building blocks of Artificial Intellgence: “Understanding”, “Learning”, “Thinking”, and “Creativity”. more
Section: Keynote
Technical level: Advanced
|