Submissions

Jul 2014

21 Mon

22 Tue

23 Wed 09:30 AM – 05:00 PM IST

24 Thu 09:45 AM – 05:00 PM IST

25 Fri 08:30 AM – 07:15 PM IST

26 Sat 08:30 AM – 07:15 PM IST

27 Sun

NIMHANS Convention Centre, Bangalore

Accepting submissions

Not accepting submissions

Visualizing large data sets

I’m going to showcase how to visualize large data sets, i.e. that have thousands to millions of data points. This goes beyond standard techniques like bar plot etc, and requires using tools like d3, Processing, ggplot2, circos and more. I will demostrate working samples, that I have created, using open source tools. Folks will gain an understanding of concepts, techniques and tools to create larg… more

1 comment
Cancelled
30 Jan 2014

Section: Full talk Technical level: Intermediate

Serving user intent : Facebook style notifications using HBase and Event streams

This talk is about building a low-latency, near real-time Notifications platform for serving user intent using Event based architecture, Complex Event Processing and a data store like HBase. Will also cover how millisecond response times are achieved when accessing data from 100 million rows by interpreting change from immutable events and organizing data as LSM trees. more

2 comments
Confirmed & scheduled
31 Jan 2014

Section: Full talk Technical level: Intermediate

Engineering custom visualisations with advanced d3.js

d3.js is a very complex library with a lot of functionality. That said, there are a lot of ready examples available on the Internet, which in turn promotes a culture of copy-paste-code. Hence, one ends up seeing recurring themes of the same charts - Sankey, Chord, Matrix, Force layout, etc. repeatedly. The objective of this workshop help a d3 developer truly harness the power of d3.js to make cus… more

0 comments
Rejected
03 Feb 2014

Section: Workshops Technical level: Advanced

ANALYTICS ON BIG FAST DATA USING REAL TIME STREAM DATA PROCESSING ARCHITECTURE

To understand big data real time processing challenges, technology maturity on real-time/near-time analytics and modern big data architecture built with Hadoop more

3 comments
Rejected
02 Mar 2014

Section: Full talk Technical level: Intermediate

Circuitscape - A Case Study on Scientific Computing

In this talk I would talk about some of the challenges faced in typical scientific computing applications and how to address them, taking Circuitscape as a case study. it would walk the audience briefly through what is possible through modern scientific computing platforms built on Python and Julia. more

0 comments
Confirmed & scheduled
03 Mar 2014

Section: Full talk Technical level: Intermediate Session type: Lecture

How to Make Big Data Real and Valuable ...

The Objective of the session is, Give the participants a very quick overview of the Data Landscape and Journey from Legacy System/Applications -> Data Integration -> ETL -> Data Warehouse -> Real Time Streaming and how all of this culminates in Big Data Architecture. more

1 comment
Rejected
27 Mar 2014

Section: Crisp talk Technical level: Intermediate

Developing Real-Time Data Pipelines with Apache Kafka

The audience would be benefitted in terms of understanding “A High-throughput distributed Messaging system”- KAFKA, which is developed used at Linkedin. more

1 comment
Rejected
27 Mar 2014

Section: Full talk Technical level: Advanced

Big Data in Telecom - Case studies

Cover real world examples of Big Data Analytics in Telecom and how this is impacting current IT landscape. It would touch upon concepts of Digital Telco and Opportunities of Data Monetization by Telco companies with some real world examples. more

0 comments
Rejected
27 Mar 2014

Section: Full talk Technical level: Intermediate

What chemistry can teach us about designing better NLP algorithms

The main idea behind this talk is how context is formed in language and how location, time, and order of words also has an effect on it. more

4 comments
Rejected
27 Mar 2014

Section: Crisp talk Technical level: Beginner

Crafting Visual Stories with Data

Data visualisation has enabled us to compress data and express them visually in many interesting new ways. It is often cited that we are trying to tell stories through them. But, the science of data-visual-stories is still very nascent and developing. On the other hand, the art of storytelling through spoken and written words, pictures, comics and movies is very well developed and understood. Let… more

0 comments
Confirmed & scheduled
29 Mar 2014

Section: Full talk Technical level: Beginner

Scaling with Queues

Share the experience of using queues based backend infra architecture for scalability, failover and data accuracy. more

2 comments
Cancelled
02 Apr 2014

Section: Full talk Technical level: Intermediate

Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd

Igor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I’ll explore the challenges faced and the architectural decisions that addressed them. I’ll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event p… more

1 comment
Submitted
09 Apr 2014

Section: Full talk Technical level: Intermediate

What would you recommend?

This workshop will provide the audience with a quick overview of recommendation systems & how to build one from scratch. We shall build user-user collaborative filter (CF) based recommendation engines as well as item-item CF recsys. The audience will get a flavour of the range of statistical & mathematical computations that go into a recsys. more

1 comment
Rejected
11 Apr 2014

Section: Workshops Technical level: Intermediate

BDAS, the Berkeley Data Analytics Stack

This talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab. more

0 comments
Rejected
15 Apr 2014

Section: Crisp talk Technical level: Beginner

'Know Your Customer!' - Advanced Data Science for Audience Segmentation

Have you ever wondered how Cisco does Customer Segmentation? What is Cisco’s technology stack to deal with Big Data? What tools and technologies are adopted to bring best-of-breed algorithms from data science to inform on the problem of identifying segments in the audience. How does supervised and semi-supervised machine-learning along with Bayesian predictive analytics combine to produce a very … more

3 comments
Confirmed & scheduled
21 Apr 2014

Section: Full talk Technical level: Advanced

Why we built the most adopted Polyglot Object Mapper for NoSQL?

The talk would narrate the story of building the most adopted Polyglot Object Mapper for NoSQL, Kundera (https://github.com/impetus-opensource/Kundera). Kundera is a High Level Client / Object Mapper with a JPA interface for working with RDBMS and NoSQL Datastores. It can be considered as a Hibernate equivalent for NoSQL Datastore. more

1 comment
Confirmed & scheduled
25 Apr 2014

Section: Full talk Technical level: Intermediate

Apache Pig Power tools

The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage. more

8 comments
Rejected
28 Apr 2014

Section: Workshops Technical level: Intermediate

Building distributed search applications using Apache SOLR

The objective of this workshop is to introduce attendees with most common features of a search application and how to implement them using Apache Solr. The workshop will also cover how to scale the application by leveraging SolrCloud. more

6 comments
Confirmed & scheduled
29 Apr 2014

Section: Workshops Technical level: Beginner

Spot the model hiding in the Big Data

This talk is intended to help businesses avoid expensive incorrect decisions based on poor understanding of the underlying models. In this talk I shall discuss ways to understand a phenomenon by triangulating across visualizations, underlying model understanding and experimentation. more

7 comments
Cancelled
30 Apr 2014

Section: Full talk Technical level: Beginner

Extending Vega - A visualisation grammar to create interactive visualisations

I want to present the work I am doing in extending a visualization grammar Vega (http://trifacta.github.io/vega/) more

4 comments
Rejected
03 May 2014

Section: Crisp talk Technical level: Beginner

Realizing Large-scale Distributed Deep Learning Networks over GraphLab

The main objective is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives of the talk can be summarized as below: more

1 comment
Confirmed & scheduled
07 May 2014

Section: Full talk Technical level: Intermediate

Storing relationships in large data-sets using Graphs

Problem Statement - Fast Programmatic/self-serve analytics on linked data in an ad system by indexing it across all cuts, especially for traversals like - more

3 comments
Confirmed & scheduled
11 May 2014

Section: Crisp talk Technical level: Advanced

Unified analytics platform for Bigdata

This talk is about a system developed at InMobi to support OLAP data cubes on top of Hive metastore. With this abstraction, users can reference single schema and data stored across diverse storage engine and that users can query data on the logical tables without knowing about schema details like relationships, rollup levels, data location and data types. more

3 comments
Confirmed & scheduled
12 May 2014

Section: Full talk Technical level: Intermediate

Experimentation to Productization : developing a Dynamic Bidding system for a location aware Mobile landscape

This session is to help structure a Hypothesis based approach to Engineering problems and learning to quickly translate & implement algorithms on weblogs(mobile footprints) data. more

0 comments
Confirmed & scheduled
12 May 2014

Section: Full talk Technical level: Intermediate

Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)

Assume that you got an opportunity to work with vast amount of unstructured and semi-structured text data in a specific domain e.g. automobiles, agriculture, medical, internet, etc. Your task is to derive business value out of this textual data by extracting a domain-specific knowledge graph (DKGraph) and employing it for various business use cases. In this problem, there are several key challeng… more

0 comments
Rejected
13 May 2014

Section: Full talk Technical level: Beginner

Using Elasticsearch for Analytics

At Wingify, we have built a system to process and store analytics data for our customers, which they can use to slice and dice the data to make more meaningful reports. This talk is about how we solved this problem and how we used Elasticsearch to solve this problem at our scale rather quickly. Audience will take away some of the data problems they can quickly solve with Elasticsearch. more

4 comments
Submitted
18 May 2014

Section: Full talk Technical level: Intermediate

Scaling real time visualisations for Elections 2014

How does one go about creating interactive real-time visualisations with rapidly changing data? This talk is about our experiences in designing the CNN-IBN and Bing election results page. more

0 comments
Confirmed & scheduled
19 May 2014

Section: Full talk Technical level: Intermediate

Hive and Presto for Big Data Analytics in the Cloud

The objective of this talk is to conceptualize the use of Hive and Presto for big data analytics. We will contrast their architecture and use cases, and describe how to take advantage of both these technologies in the cloud. more

2 comments
Submitted
20 May 2014

Section: Full talk Technical level: Intermediate

How to build a Data Stack from scratch

This talk will cover a framework for thinking about the analytics data stack. What are the things to consider when building a data stack from scratch. How to choose the right software for your stack whether it is visualisation, analytics or storage ? It will talk about the relations between different techniques for extracting insights outs of raw data. I will draw upon examples from my experience… more

1 comment
Confirmed & scheduled
22 May 2014

Section: Full talk Technical level: Intermediate

De-dup @ Scale : Experiments with DynamoDB

What should you know if you want to integrate DynamoDB into your BigData application ? more

3 comments
Cancelled
22 May 2014

Section: Full talk Technical level: Intermediate

Lambda Architecture

Educate and discuss on principles and best practices to build large scale data processing architectures. Introduction to “Lambda Architecture” proposed by Nathan Marz (Storm guy) more

1 comment
Rejected
23 May 2014

Section: Full talk Technical level: Intermediate

Apache Tez: Accelerating Hadoop Data Pipelines

Apache Tez is a DAG execution engine which exists as a super-set of traditional Map Reduce. Tez designed as a replacement computational model for nearly everything that currently uses map-reduce. more

5 comments
Confirmed & scheduled
23 May 2014

Section: Full talk Technical level: Beginner

Live analytical dashboards at scale - SQL style

How to build a real-time, analytical dashboads that can enable business take decisions at scale? There are various technologies out there that fill one or the other use case - right from horizontally scalable queues such as kafka, stream processing systems such as storm, data stores such as openTSDB and druid that can provide dimensional lookup on large amount of data and visualisation libraries … more

7 comments
Confirmed & scheduled
26 May 2014

Section: Full talk Technical level: Intermediate

Tailor made stores at myntra or how to personalize your search results

This will showcase a unique way of personalization which is a combination between search and recommendations. Here we’ll not go into the details of the algorithms of how a product is deemed suitable for a user but given that the product has been shortlisted on certain criteria how to show case that. The talk will give some idea of Cassandra and Solr. more

2 comments
Rejected
31 May 2014

Section: Crisp talk Technical level: Intermediate

Machine learning at scale with Spark

Take the audience throught my journey of learning machine learning from scarath using various freely available resources and building applications it on the big data using Apache Spark and MLLib. more

0 comments
Confirmed
31 May 2014

Section: Full talk Technical level: Beginner

Machine Learning using R : Crash course in Classification Methods

The aim is to provide the attendees with an overview (implementation-wise) of some of the major classification methods using R. The focus of the workshop will be on breadth rather than depth. A lot of methods will be introduced, but their mathematical properties won’t be discussed in detail. more

2 comments
Confirmed & scheduled
01 Jun 2014

Section: Workshops Technical level: Beginner

Machine learning + Interactive visualization: A pragmatic approach to fixing knowledge bases

We wish to explore how the use of recommenders and visualization can help in fixing problems inherent to knowledge bases. We will tackle one such problem which is incorrect/missing assignment of tags to articles in a knowledge base. We will also demonstrate how off-the-shelf software in the Hadoop ecosystem could be used to improve the richness of this data through processing and visualization. W… more

0 comments
Rejected
01 Jun 2014

Section: Full talk Technical level: Beginner

Advanced Big Data Analytics using Apache Mahout and Giraph

It is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-proces… more

6 comments
Rejected
02 Jun 2014

Section: Workshops Technical level: Advanced

Scaling Spatial Data - OpenStreetMap as Infrastructure.

For the success of any location service, the length and breadth of geographic relationships have to be recorded with enough room for frequent verification and classification. This talk will introduce the infrastructure behind the largest open geographic data repository - OpenStreetMap - and how you can leverage the complete geospatial stack for independent data collection, verification, and build… more

0 comments
Confirmed & scheduled
03 Jun 2014

Section: Full talk Technical level: Intermediate

The ART of Data Mining - Practical Learnings from Real-world Data Mining applications

Machine Learning and data mining is part SCIENCE (ML algorithms, optimization), part ENGINEERING (large scale modeling, real-time decisions), part PROCESS (data understanding, feature engineering, modelling, evaluation, and deployment), and part ART. In this talk we will focus more on the “ART of data mining” - the little things that make the big difference in the quality and sophistication of ma… more

8 comments
Confirmed & scheduled
03 Jun 2014

Section: Full talk Technical level: Intermediate

Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.

In this talk I will try to bring some new concepts that will help data scientists to run their predictive algorithms on hadoop with the help of PMML and cascading. more

1 comment
Rejected
03 Jun 2014

Section: Full talk Technical level: Intermediate

Fast Elephant - the Cheeliphant (Cheetah-Elephant)!

In this talk I shall share the spectrum of technologies and the evolution of the Big Data and Analytics space and its associated infrastructures. I shall also touch upon the tips and traps of using these infrastructure and useful thumbrules for designing systems. more

0 comments
Cancelled
04 Jun 2014

Section: Full talk Technical level: Beginner

Migrating traditional warehouse and its applications to a Big-data platform

Understanding the capabilities/limitations of Hadoop platform for efficient migration more

0 comments
Waitlisted
04 Jun 2014

Section: Full talk Technical level: Intermediate

Real Time Secure API delivering data @ scale

At ThoughtWorks, we have used a Hybrid Approach for designing a Real Time secure API, which gives various adhoc querying capability on large amount of data. more

1 comment
Rejected
04 Jun 2014

Section: Crisp talk Technical level: Beginner

Filtering the noise from an avalanche of Google Analytics Metrics : Anomaly Detection

At Tatvic, we have built an Anomaly Detection Engine that alerts the analyst about sporadic changes in Google Analytics metrics. Additionally, the analyst can also drill down into the possible root causes of the anomaly which enables him to take quicker business decisions. more

0 comments
Rejected
07 Jun 2014

Section: Crisp talk Technical level: Intermediate

Using Cascalog and Clojure to make the elephant move!

Intent is to highlight benefits gained by using Clojure, a functional language which works on JVM and Cascalog data processing library for Hadoop. The participants will be exposed to, more

1 comment
Confirmed & scheduled
08 Jun 2014

Section: Crisp talk Technical level: Intermediate

Analytics on Large Scale, Unstructured, Dynamic Data using Lambda Architecture

In this talk, I will focus on our experience in using Lambda Architecture at Indix, to build a large scale analytics system on unstructured, dynamically changing data sources using Hadoop, HBase, Scalding, Spark and Solr. more

4 comments
Confirmed & scheduled
09 Jun 2014

Section: Full talk Technical level: Intermediate

Latest trends in Market Mix Modeling & a unique way of making measurement & optimization more effective

Learn a new way of doing MMX modeling and challenge the traditional way of doing it in your organizations. Apply the same principles to all your other analytics problems. more

1 comment
Rejected
10 Jun 2014

Section: Crisp talk Technical level: Advanced

Data sciences (is) in fashion @ Myntra

Ever dreamt that you can walk into a store which has been designed just for you? A store where the shelves have been stacked keeping in mind your fashion preferences only. A sales rep who understands what you wear and what’s missing in your wardrobe. Myntra is fast transforming itself into such a hyper-personalized (1:1) store and this transformation is being powered solely through analytics over… more

2 comments
Confirmed & scheduled
10 Jun 2014

Section: Full talk Technical level: Intermediate

Lessons from Elasticsearch in production

This talk is for people who are planning to use Elasticsearch in their next project. more

1 comment
Confirmed & scheduled
11 Jun 2014

Section: Full talk Technical level: Intermediate

The state of Julia - a fast language for technical computing

Last year at the Fifth Elephant, I gave a talk introducing the Julia programming language (http://www.julialang.org/). This year, I propose to give a short talk on the current state of Julia. more

1 comment
Confirmed & scheduled
11 Jun 2014

Section: Crisp talk Technical level: Intermediate

Real world machine learning

We will become familiar with real world machine learning in a hands on, intuitive way. Rather than taking the algorithm and its results as a black box provided by a library and learning in a cookbook style, we will try to understand the why of the problem. Participants will also appreciate the importance of each phase (data exploration, data cleaning and extraction, modeling, evaluation) of machi… more

0 comments
Confirmed & scheduled
11 Jun 2014

Section: Workshops Technical level: Intermediate

Twitter data collection framework for dummies.

This talk is about how I got 200 odd GB of tweets over a 45 day period to build a Trend Summarizer. I chose to build this as part of the dissertation for my MS Programme. The main objective here was to fetch tweets belonging to a trend in different locations. Additionally, I wanted this to be scalable out of the box i.e. if I increased the number of locations to look for, It shouldn’t run into pr… more

0 comments
Rejected
11 Jun 2014

Section: Full talk Technical level: Beginner

Interactive analytics on event streams with complexly nested schemas

In this talk, I will share the lessons that we learnt while building an application for interactively analyzing data from event streams like twitter firehose, click streams, and application logs with complexly nested schemas. I will discuss the challenges faced while implementing the whole analytics stack that has Kafka for data collection, Elasticsearch for realtime search, and Apache Drill for … more

0 comments
Submitted
12 Jun 2014

Section: Full talk Technical level: Intermediate

big data analytics with machine learning

We crunch the numbers and turn your data into accessible and intuitive visuals that you can use for presentations, information sharing, and overall company transparency and most importantly for analysis. Predictive analytics has been closely linked to big data. Based on the patterns of how your variables have behaved over time, our machine learning algorithms can predict how the same variables ar… more

0 comments
Rejected
12 Jun 2014

Section: Crisp talk Technical level: Beginner

De-dup on Hadoop

In this talk, I wish to share experiences we had at Intuit in building Master Data Management solution on Hadoop platform. At the core MDM solution consists of fuzzy matching, entity resolution and de-duplication. Solving these patterns on Big Data Platform like Hadoop is the focus of this discussion. more

0 comments
Confirmed & scheduled
12 Jun 2014

Section: Crisp talk Technical level: Beginner

Dr. Hadoop – Diagnose your Hadoop Jobs

Have you faced a problem where you run a job or query on hadoop, which runs very slow, and you have no clue why? You look at your job details on jobtracker and get confused with hundreds of counters and configurations? You really don’t know how to make sense out of it. This is a very common challenge for hadoop beginners specially the analysts or the people coming from RDBMS world. This talk is a… more

4 comments
Confirmed & scheduled
13 Jun 2014

Section: Crisp talk Technical level: Intermediate

Big data in finance

The talk will cover a case study of solving a research problem in algorithmic trading using high frequency data from a stock exchange. more

1 comment
Confirmed & scheduled
13 Jun 2014

Section: Full talk Technical level: Intermediate

Supercharge Application I/O Performance with SSD caching

Storage I/O Performance plays a significant role in determining overall application end user response times and perceived user latency. How can you leverage solid state drives (SSD) to boost OLTP application I/O performance (for e.g., MySQL, MongoDB) in a holistic, non-disruptive and cost effective manner, without throwing away your hard disk but utilizing it for capacity? Through this talk, I’ll… more

0 comments
Submitted
15 Jun 2014

Section: Full talk Technical level: Intermediate

Overcoming problems that you will face when trying to break speed limit

It is everyone’s continuous quest to improve the speed at which we do things. What is fast in the past is no longer fast. We need to continuously improve things. In those efforts, we face problems. We also get new opportunities because of the evolving technologies. This talk is to share our knowledge about the opportunities we used and how we overcame some of the problems. more

1 comment
Rejected
15 Jun 2014

Section: Full talk Technical level: Intermediate

Scaling SolrCloud to a large number of collections

The objective of this talk is to share the challenges and learnings from setting up a large SolrCloud installation running on hundreds of nodes with thousands of collections and millions of users. This talk will also help people understand the guts of SolrCloud’s architecture. more

0 comments
Confirmed & scheduled
15 Jun 2014

Section: Full talk Technical level: Advanced

How to deploy a 50 node SolrCloud cluster on AWS in 15 minutes

The objective of this short talk is to demonstrate the newly open sourced Solr Scale Toolkit which makes setting up and managing a SolrCloud cluster on AWS a snap. more

0 comments
Rejected
15 Jun 2014

Section: Crisp talk Technical level: Beginner

Using Data for Art

Why listen to a talk on art in a conference focused on technologies that power analytics? Because beyond the world of functional, need based & user-centered applications is a much more diverse world of data art. A field that has lesser constraints, and more opportunity for creative expression. Not only would it be exciting for any data enthusiast to see how technology & data are being used by the… more

0 comments
Confirmed & scheduled
16 Jun 2014

Section: Crisp talk Technical level: Beginner

Getting your hands dirty with Aerospike

Aerospike is the new open-source NoSQL database. It is easily the fastest clustered NoSQL database solution. It is also well known for its operational easy-of-use. The objective of the workshop is to get your hands dirty with it. more

0 comments
Confirmed & scheduled
01 Jul 2014

Section: Sponsored workshop Technical level: Intermediate

Real Time User-Scoring for Bidding in Display Retargeting

Retargeting online customers to a retail website via Display Ads has become an incredible avenue to drive traffic back to the website. Especially with the advent of Real Time Bidding (RTB), advertisers now have access to an efficient and transparent mechanism to buy from this huge volume of available ad inventory. It allows an advertiser to optimize their ad spend down to the exact user they are … more

0 comments
Confirmed & scheduled
03 Jul 2014

Section: Crisp talk Technical level: Beginner

Large Scale Modelling and Analytics Challenges at a Payments Company

This talk first presents a broad overview of the Big Data challenges in a payments company. Then it discusses in details an application around modelling spend behavior of credit card holders. Through the application the talk demonstrates how various machine learning and data mining techniques are utilized to glean insights from petabyte scale data, and how one build practical models to solve real… more

0 comments
Confirmed & scheduled
04 Jul 2014

Section: Full talk Technical level: Intermediate

Jul 2014

21 Mon

22 Tue

23 Wed 09:30 AM – 05:00 PM IST

24 Thu 09:45 AM – 05:00 PM IST

25 Fri 08:30 AM – 07:15 PM IST

26 Sat 08:30 AM – 07:15 PM IST

27 Sun

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures