Submissions

The Fifth Elephant 2014

A conference on big data and analytics

In 2014, infrastructure components such as Hadoop, Berkeley Data Stack and other commercial tools have stabilized and are thriving. The challenges have moved higher up the stack from data collection and storage to data analysis and its presentation to users. The focus for this year’s conference on analytics – the infrastructure that powers analytics and how analytics is done.

Talks will cover various forms of analytics including real-time and opportunity analytics, and technologies and models used for analyzing data.

Proposals will be reviewed using 5 criteria:
Domain diversity – proposals will be selected from different domains – medical, insurance, banking, online transactions, retail. If there is more than one proposal from a domain, the one which meets the editorial criteria will be chosen.
Novelty – what has been done beyond the obvious.
Insights – what insights does the proposal share with the audience that they did not know earlier.
Practical versus theoretical – we are looking for applied knowledge. If the proposal covers material that can be looked up online, it will not be considered.
Conceptual versus tools-centric – tell us why, not how. Tell the audience what was the philosophy underlying your use of an application, not how an application was used.
Presentation skills – proposer’s presentation skills will be reviewed carefully and assistance provided to ensure that the material is communicated in the most precise and effective manner to the audience.

Tickets: http://fifthel.doattend.com

Website: https://fifthelephant.in/2014

For queries about proposals / submissions, write to info@hasgeek.com

Theme

  1. Data Collection and Transport – for e.g, Opendatatoolkit, Scribe, Kafka, RabbitMQ, etc.

  2. Data Storage, Caching and Management – Distributed storage (such as Gluster, HDFS) or hardware-specific (such as SSD or memory) or databases (Postgresql, MySQL, Infobright) or caching/storage (Memcache, Cassandra, Redis, etc).

  3. Data Processing, Querying and Analysis – Oozie, Azkaban, scikit-learn, Mahout, Impala, Hive, Tez, etc.

  4. Real-time analytics

  5. Opportunity analytics

  6. Big data and security

  7. Big data and internet of things

  8. Data Usage and BI (Business Intelligence) in different sectors.

Please note: the technology stacks mentioned above indicate latest technologies that will be of interest to the community. Talks should not be on the technologies per se, but how these have been used and implemented in various sectors, enterprises and contexts.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Accepting submissions

Not accepting submissions

Puneet Mohan Sangal

Visualizing large data sets

I’m going to showcase how to visualize large data sets, i.e. that have thousands to millions of data points. This goes beyond standard techniques like bar plot etc, and requires using tools like d3, Processing, ggplot2, circos and more. I will demostrate working samples, that I have created, using open source tools. Folks will gain an understanding of concepts, techniques and tools to create larg… more
  • 1 comment
  • Cancelled
  • 30 Jan 2014
Section: Full talk Technical level: Intermediate

Regunath Balasubramanian

Serving user intent : Facebook style notifications using HBase and Event streams

This talk is about building a low-latency, near real-time Notifications platform for serving user intent using Event based architecture, Complex Event Processing and a data store like HBase. Will also cover how millisecond response times are achieved when accessing data from 100 million rows by interpreting change from immutable events and organizing data as LSM trees. more
  • 2 comments
  • Confirmed & scheduled
  • 31 Jan 2014
Section: Full talk Technical level: Intermediate

Chirag Gehlot

Engineering custom visualisations with advanced d3.js

d3.js is a very complex library with a lot of functionality. That said, there are a lot of ready examples available on the Internet, which in turn promotes a culture of copy-paste-code. Hence, one ends up seeing recurring themes of the same charts - Sankey, Chord, Matrix, Force layout, etc. repeatedly. The objective of this workshop help a d3 developer truly harness the power of d3.js to make cus… more
  • 0 comments
  • Rejected
  • 03 Feb 2014
Section: Workshops Technical level: Advanced

Arvind Gopinath

ANALYTICS ON BIG FAST DATA USING REAL TIME STREAM DATA PROCESSING ARCHITECTURE

To understand big data real time processing challenges, technology maturity on real-time/near-time analytics and modern big data architecture built with Hadoop more
  • 3 comments
  • Rejected
  • 02 Mar 2014
Section: Full talk Technical level: Intermediate

Viral B. Shah

Circuitscape - A Case Study on Scientific Computing

In this talk I would talk about some of the challenges faced in typical scientific computing applications and how to address them, taking Circuitscape as a case study. it would walk the audience briefly through what is possible through modern scientific computing platforms built on Python and Julia. more
  • 0 comments
  • Confirmed & scheduled
  • 03 Mar 2014
Section: Full talk Technical level: Intermediate Session type: Lecture

Mayur Shah

How to Make Big Data Real and Valuable ...

The Objective of the session is, Give the participants a very quick overview of the Data Landscape and Journey from Legacy System/Applications -> Data Integration -> ETL -> Data Warehouse -> Real Time Streaming and how all of this culminates in Big Data Architecture. more
  • 1 comment
  • Rejected
  • 27 Mar 2014
Section: Crisp talk Technical level: Intermediate

Manisha Sethi

Developing Real-Time Data Pipelines with Apache Kafka

The audience would be benefitted in terms of understanding “A High-throughput distributed Messaging system”- KAFKA, which is developed used at Linkedin. more
  • 1 comment
  • Rejected
  • 27 Mar 2014
Section: Full talk Technical level: Advanced

Siddharth Vijayvergiya

Big Data in Telecom - Case studies

Cover real world examples of Big Data Analytics in Telecom and how this is impacting current IT landscape. It would touch upon concepts of Digital Telco and Opportunities of Data Monetization by Telco companies with some real world examples. more
  • 0 comments
  • Rejected
  • 27 Mar 2014
Section: Full talk Technical level: Intermediate

Siva Prakash Kollana

What chemistry can teach us about designing better NLP algorithms

The main idea behind this talk is how context is formed in language and how location, time, and order of words also has an effect on it. more
  • 4 comments
  • Rejected
  • 27 Mar 2014
Section: Crisp talk Technical level: Beginner

Amit Kapoor

Crafting Visual Stories with Data

Data visualisation has enabled us to compress data and express them visually in many interesting new ways. It is often cited that we are trying to tell stories through them. But, the science of data-visual-stories is still very nascent and developing. On the other hand, the art of storytelling through spoken and written words, pictures, comics and movies is very well developed and understood. Let… more
  • 0 comments
  • Confirmed & scheduled
  • 29 Mar 2014
Section: Full talk Technical level: Beginner

Rohit Yadav

Scaling with Queues

Share the experience of using queues based backend infra architecture for scalability, failover and data accuracy. more
  • 2 comments
  • Cancelled
  • 02 Apr 2014
Section: Full talk Technical level: Intermediate

Suman Karthik

Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd

Igor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I’ll explore the challenges faced and the architectural decisions that addressed them. I’ll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event p… more
  • 1 comment
  • Submitted
  • 09 Apr 2014
Section: Full talk Technical level: Intermediate

Anand

What would you recommend?

This workshop will provide the audience with a quick overview of recommendation systems & how to build one from scratch. We shall build user-user collaborative filter (CF) based recommendation engines as well as item-item CF recsys. The audience will get a flavour of the range of statistical & mathematical computations that go into a recsys. more
  • 1 comment
  • Rejected
  • 11 Apr 2014
Section: Workshops Technical level: Intermediate

Mukesh Gangadhar

BDAS, the Berkeley Data Analytics Stack

This talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab. more
  • 0 comments
  • Rejected
  • 15 Apr 2014
Section: Crisp talk Technical level: Beginner

prabhakar srinivasan

'Know Your Customer!' - Advanced Data Science for Audience Segmentation

Have you ever wondered how Cisco does Customer Segmentation? What is Cisco’s technology stack to deal with Big Data? What tools and technologies are adopted to bring best-of-breed algorithms from data science to inform on the problem of identifying segments in the audience. How does supervised and semi-supervised machine-learning along with Bayesian predictive analytics combine to produce a very … more
  • 3 comments
  • Confirmed & scheduled
  • 21 Apr 2014
Section: Full talk Technical level: Advanced

Vivek Shrivastava

Why we built the most adopted Polyglot Object Mapper for NoSQL?

The talk would narrate the story of building the most adopted Polyglot Object Mapper for NoSQL, Kundera (https://github.com/impetus-opensource/Kundera). Kundera is a High Level Client / Object Mapper with a JPA interface for working with RDBMS and NoSQL Datastores. It can be considered as a Hibernate equivalent for NoSQL Datastore. more
  • 1 comment
  • Confirmed & scheduled
  • 25 Apr 2014
Section: Full talk Technical level: Intermediate

visuthemoon

Apache Pig Power tools

The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage. more
  • 8 comments
  • Rejected
  • 28 Apr 2014
Section: Workshops Technical level: Intermediate

Saumitra Srivastav

Building distributed search applications using Apache SOLR

The objective of this workshop is to introduce attendees with most common features of a search application and how to implement them using Apache Solr. The workshop will also cover how to scale the application by leveraging SolrCloud. more
  • 6 comments
  • Confirmed & scheduled
  • 29 Apr 2014
Section: Workshops Technical level: Beginner

Ashok Banerjee

Spot the model hiding in the Big Data

This talk is intended to help businesses avoid expensive incorrect decisions based on poor understanding of the underlying models. In this talk I shall discuss ways to understand a phenomenon by triangulating across visualizations, underlying model understanding and experimentation. more
  • 7 comments
  • Cancelled
  • 30 Apr 2014
Section: Full talk Technical level: Beginner

anupamme

Extending Vega - A visualisation grammar to create interactive visualisations

I want to present the work I am doing in extending a visualization grammar Vega (http://trifacta.github.io/vega/) more
  • 4 comments
  • Rejected
  • 03 May 2014
Section: Crisp talk Technical level: Beginner

Dr. Vijay Srinivas A

Realizing Large-scale Distributed Deep Learning Networks over GraphLab

The main objective is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives of the talk can be summarized as below: more
  • 1 comment
  • Confirmed & scheduled
  • 07 May 2014
Section: Full talk Technical level: Intermediate

Inder Singh

Storing relationships in large data-sets using Graphs

Problem Statement - Fast Programmatic/self-serve analytics on linked data in an ad system by indexing it across all cuts, especially for traversals like - more
  • 3 comments
  • Confirmed & scheduled
  • 11 May 2014
Section: Crisp talk Technical level: Advanced

Amareshwari Sriramadasu

Unified analytics platform for Bigdata

This talk is about a system developed at InMobi to support OLAP data cubes on top of Hive metastore. With this abstraction, users can reference single schema and data stored across diverse storage engine and that users can query data on the logical tables without knowing about schema details like relationships, rollup levels, data location and data types. more
  • 3 comments
  • Confirmed & scheduled
  • 12 May 2014
Section: Full talk Technical level: Intermediate

Ekta Grover

Experimentation to Productization : developing a Dynamic Bidding system for a location aware Mobile landscape

This session is to help structure a Hypothesis based approach to Engineering problems and learning to quickly translate & implement algorithms on weblogs(mobile footprints) data. more
  • 0 comments
  • Confirmed & scheduled
  • 12 May 2014
Section: Full talk Technical level: Intermediate

Satnam Singh, PhD

Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)

Assume that you got an opportunity to work with vast amount of unstructured and semi-structured text data in a specific domain e.g. automobiles, agriculture, medical, internet, etc. Your task is to derive business value out of this textual data by extracting a domain-specific knowledge graph (DKGraph) and employing it for various business use cases. In this problem, there are several key challeng… more
  • 0 comments
  • Rejected
  • 13 May 2014
Section: Full talk Technical level: Beginner
Vaidik Kapoor

Vaidik Kapoor

Using Elasticsearch for Analytics

At Wingify, we have built a system to process and store analytics data for our customers, which they can use to slice and dice the data to make more meaningful reports. This talk is about how we solved this problem and how we used Elasticsearch to solve this problem at our scale rather quickly. Audience will take away some of the data problems they can quickly solve with Elasticsearch. more
  • 4 comments
  • Submitted
  • 18 May 2014
Section: Full talk Technical level: Intermediate

Anand S

Scaling real time visualisations for Elections 2014

How does one go about creating interactive real-time visualisations with rapidly changing data? This talk is about our experiences in designing the CNN-IBN and Bing election results page. more
  • 0 comments
  • Confirmed & scheduled
  • 19 May 2014
Section: Full talk Technical level: Intermediate

Vikram Agrawal

Hive and Presto for Big Data Analytics in the Cloud

The objective of this talk is to conceptualize the use of Hive and Presto for big data analytics. We will contrast their architecture and use cases, and describe how to take advantage of both these technologies in the cloud. more
  • 2 comments
  • Submitted
  • 20 May 2014
Section: Full talk Technical level: Intermediate
Vinayak Hegde

Vinayak Hegde

How to build a Data Stack from scratch

This talk will cover a framework for thinking about the analytics data stack. What are the things to consider when building a data stack from scratch. How to choose the right software for your stack whether it is visualisation, analytics or storage ? It will talk about the relations between different techniques for extracting insights outs of raw data. I will draw upon examples from my experience… more
  • 1 comment
  • Confirmed & scheduled
  • 22 May 2014
Section: Full talk Technical level: Intermediate

Hemanth Yamijala

De-dup @ Scale : Experiments with DynamoDB

What should you know if you want to integrate DynamoDB into your BigData application ? more
  • 3 comments
  • Cancelled
  • 22 May 2014
Section: Full talk Technical level: Intermediate

Nitin Supekar

Lambda Architecture

Educate and discuss on principles and best practices to build large scale data processing architectures. Introduction to “Lambda Architecture” proposed by Nathan Marz (Storm guy) more
  • 1 comment
  • Rejected
  • 23 May 2014
Section: Full talk Technical level: Intermediate

t3rmin4t0r

Apache Tez: Accelerating Hadoop Data Pipelines

Apache Tez is a DAG execution engine which exists as a super-set of traditional Map Reduce. Tez designed as a replacement computational model for nearly everything that currently uses map-reduce. more
  • 5 comments
  • Confirmed & scheduled
  • 23 May 2014
Section: Full talk Technical level: Beginner

Shashwat Agarwal

Live analytical dashboards at scale - SQL style

How to build a real-time, analytical dashboads that can enable business take decisions at scale? There are various technologies out there that fill one or the other use case - right from horizontally scalable queues such as kafka, stream processing systems such as storm, data stores such as openTSDB and druid that can provide dimensional lookup on large amount of data and visualisation libraries … more
  • 7 comments
  • Confirmed & scheduled
  • 26 May 2014
Section: Full talk Technical level: Intermediate

Apoorva Gaurav

Tailor made stores at myntra or how to personalize your search results

This will showcase a unique way of personalization which is a combination between search and recommendations. Here we’ll not go into the details of the algorithms of how a product is deemed suitable for a user but given that the product has been shortlisted on certain criteria how to show case that. The talk will give some idea of Cassandra and Solr. more
  • 2 comments
  • Rejected
  • 31 May 2014
Section: Crisp talk Technical level: Intermediate

madhukara phatak

Machine learning at scale with Spark

Take the audience throught my journey of learning machine learning from scarath using various freely available resources and building applications it on the big data using Apache Spark and MLLib. more
  • 0 comments
  • Confirmed
  • 31 May 2014
Section: Full talk Technical level: Beginner

Bargava Subramanian

Machine Learning using R : Crash course in Classification Methods

The aim is to provide the attendees with an overview (implementation-wise) of some of the major classification methods using R. The focus of the workshop will be on breadth rather than depth. A lot of methods will be introduced, but their mathematical properties won’t be discussed in detail. more
  • 2 comments
  • Confirmed & scheduled
  • 01 Jun 2014
Section: Workshops Technical level: Beginner

Viraj Paripatyadar

Machine learning + Interactive visualization: A pragmatic approach to fixing knowledge bases

We wish to explore how the use of recommenders and visualization can help in fixing problems inherent to knowledge bases. We will tackle one such problem which is incorrect/missing assignment of tags to articles in a knowledge base. We will also demonstrate how off-the-shelf software in the Hadoop ecosystem could be used to improve the richness of this data through processing and visualization. W… more
  • 0 comments
  • Rejected
  • 01 Jun 2014
Section: Full talk Technical level: Beginner
Swapnil Dubey

Swapnil Dubey

Advanced Big Data Analytics using Apache Mahout and Giraph

It is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-proces… more
  • 6 comments
  • Rejected
  • 02 Jun 2014
Section: Workshops Technical level: Advanced

Sajjad Anwar

Scaling Spatial Data - OpenStreetMap as Infrastructure.

For the success of any location service, the length and breadth of geographic relationships have to be recorded with enough room for frequent verification and classification. This talk will introduce the infrastructure behind the largest open geographic data repository - OpenStreetMap - and how you can leverage the complete geospatial stack for independent data collection, verification, and build… more
  • 0 comments
  • Confirmed & scheduled
  • 03 Jun 2014
Section: Full talk Technical level: Intermediate

Shailesh Kumar

The ART of Data Mining - Practical Learnings from Real-world Data Mining applications

Machine Learning and data mining is part SCIENCE (ML algorithms, optimization), part ENGINEERING (large scale modeling, real-time decisions), part PROCESS (data understanding, feature engineering, modelling, evaluation, and deployment), and part ART. In this talk we will focus more on the “ART of data mining” - the little things that make the big difference in the quality and sophistication of ma… more
  • 8 comments
  • Confirmed & scheduled
  • 03 Jun 2014
Section: Full talk Technical level: Intermediate

GaganDeep Juneja

Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.

In this talk I will try to bring some new concepts that will help data scientists to run their predictive algorithms on hadoop with the help of PMML and cascading. more
  • 1 comment
  • Rejected
  • 03 Jun 2014
Section: Full talk Technical level: Intermediate

Ashok Banerjee

Fast Elephant - the Cheeliphant (Cheetah-Elephant)!

In this talk I shall share the spectrum of technologies and the evolution of the Big Data and Analytics space and its associated infrastructures. I shall also touch upon the tips and traps of using these infrastructure and useful thumbrules for designing systems. more
  • 0 comments
  • Cancelled
  • 04 Jun 2014
Section: Full talk Technical level: Beginner

Manish Shukla

Migrating traditional warehouse and its applications to a Big-data platform

Understanding the capabilities/limitations of Hadoop platform for efficient migration more
  • 0 comments
  • Waitlisted
  • 04 Jun 2014
Section: Full talk Technical level: Intermediate

Akash Mishra

Real Time Secure API delivering data @ scale

At ThoughtWorks, we have used a Hybrid Approach for designing a Real Time secure API, which gives various adhoc querying capability on large amount of data. more
  • 1 comment
  • Rejected
  • 04 Jun 2014
Section: Crisp talk Technical level: Beginner

Kushan Shah

Filtering the noise from an avalanche of Google Analytics Metrics : Anomaly Detection

At Tatvic, we have built an Anomaly Detection Engine that alerts the analyst about sporadic changes in Google Analytics metrics. Additionally, the analyst can also drill down into the possible root causes of the anomaly which enables him to take quicker business decisions. more
  • 0 comments
  • Rejected
  • 07 Jun 2014
Section: Crisp talk Technical level: Intermediate

Harshad Saykhedkar

Using Cascalog and Clojure to make the elephant move!

Intent is to highlight benefits gained by using Clojure, a functional language which works on JVM and Cascalog data processing library for Hadoop. The participants will be exposed to, more
  • 1 comment
  • Confirmed & scheduled
  • 08 Jun 2014
Section: Crisp talk Technical level: Intermediate

Abinasha Karana

Ten things to consider for Interactive Analytics on high volume, write-once workloads

With the advance of No-SQL and big data, there has been an explosion of database technologies. Each of them are best suitable for certain kind of work loads. For applications such as log analysis, sensor data analytics, genome data analytics, what is the framework to evaluate the best suitable databases. This session explains core technologies which benefit write-once workload and mapping to vari… more
  • 0 comments
  • Shortlisted
  • 09 Jun 2014
Section: Full talk Technical level: Advanced

Rajesh Muppalla

Analytics on Large Scale, Unstructured, Dynamic Data using Lambda Architecture

In this talk, I will focus on our experience in using Lambda Architecture at Indix, to build a large scale analytics system on unstructured, dynamically changing data sources using Hadoop, HBase, Scalding, Spark and Solr. more
  • 4 comments
  • Confirmed & scheduled
  • 09 Jun 2014
Section: Full talk Technical level: Intermediate

rhebbar Proposing

Latest trends in Market Mix Modeling & a unique way of making measurement & optimization more effective

Learn a new way of doing MMX modeling and challenge the traditional way of doing it in your organizations. Apply the same principles to all your other analytics problems. more
  • 1 comment
  • Rejected
  • 10 Jun 2014
Section: Crisp talk Technical level: Advanced

Divya Alok

Data sciences (is) in fashion @ Myntra

Ever dreamt that you can walk into a store which has been designed just for you? A store where the shelves have been stacked keeping in mind your fashion preferences only. A sales rep who understands what you wear and what’s missing in your wardrobe. Myntra is fast transforming itself into such a hyper-personalized (1:1) store and this transformation is being powered solely through analytics over… more
  • 2 comments
  • Confirmed & scheduled
  • 10 Jun 2014
Section: Full talk Technical level: Intermediate

swaroopch

Lessons from Elasticsearch in production

This talk is for people who are planning to use Elasticsearch in their next project. more
  • 1 comment
  • Confirmed & scheduled
  • 11 Jun 2014
Section: Full talk Technical level: Intermediate

Viral B. Shah

The state of Julia - a fast language for technical computing

Last year at the Fifth Elephant, I gave a talk introducing the Julia programming language (http://www.julialang.org/). This year, I propose to give a short talk on the current state of Julia. more
  • 1 comment
  • Confirmed & scheduled
  • 11 Jun 2014
Section: Crisp talk Technical level: Intermediate

Harshad Saykhedkar

Real world machine learning

We will become familiar with real world machine learning in a hands on, intuitive way. Rather than taking the algorithm and its results as a black box provided by a library and learning in a cookbook style, we will try to understand the why of the problem. Participants will also appreciate the importance of each phase (data exploration, data cleaning and extraction, modeling, evaluation) of machi… more
  • 0 comments
  • Confirmed & scheduled
  • 11 Jun 2014
Section: Workshops Technical level: Intermediate
Nischal HP

Nischal HP

Twitter data collection framework for dummies.

This talk is about how I got 200 odd GB of tweets over a 45 day period to build a Trend Summarizer. I chose to build this as part of the dissertation for my MS Programme. The main objective here was to fetch tweets belonging to a trend in different locations. Additionally, I wanted this to be scalable out of the box i.e. if I increased the number of locations to look for, It shouldn’t run into pr… more
  • 0 comments
  • Rejected
  • 11 Jun 2014
Section: Full talk Technical level: Beginner

Abishek Baskaran

Interactive analytics on event streams with complexly nested schemas

In this talk, I will share the lessons that we learnt while building an application for interactively analyzing data from event streams like twitter firehose, click streams, and application logs with complexly nested schemas. I will discuss the challenges faced while implementing the whole analytics stack that has Kafka for data collection, Elasticsearch for realtime search, and Apache Drill for … more
  • 0 comments
  • Submitted
  • 12 Jun 2014
Section: Full talk Technical level: Intermediate

Swapnil Birla

big data analytics with machine learning

We crunch the numbers and turn your data into accessible and intuitive visuals that you can use for presentations, information sharing, and overall company transparency and most importantly for analysis. Predictive analytics has been closely linked to big data. Based on the patterns of how your variables have behaved over time, our machine learning algorithms can predict how the same variables ar… more
  • 0 comments
  • Rejected
  • 12 Jun 2014
Section: Crisp talk Technical level: Beginner

Neeta Pande

De-dup on Hadoop

In this talk, I wish to share experiences we had at Intuit in building Master Data Management solution on Hadoop platform. At the core MDM solution consists of fuzzy matching, entity resolution and de-duplication. Solving these patterns on Big Data Platform like Hadoop is the focus of this discussion. more
  • 0 comments
  • Confirmed & scheduled
  • 12 Jun 2014
Section: Crisp talk Technical level: Beginner

Chandraprakash Bhagtani

Dr. Hadoop – Diagnose your Hadoop Jobs

Have you faced a problem where you run a job or query on hadoop, which runs very slow, and you have no clue why? You look at your job details on jobtracker and get confused with hundreds of counters and configurations? You really don’t know how to make sense out of it. This is a very common challenge for hadoop beginners specially the analysts or the people coming from RDBMS world. This talk is a… more
  • 4 comments
  • Confirmed & scheduled
  • 13 Jun 2014
Section: Crisp talk Technical level: Intermediate

Chirag Anand

Big data in finance

The talk will cover a case study of solving a research problem in algorithmic trading using high frequency data from a stock exchange. more
  • 1 comment
  • Confirmed & scheduled
  • 13 Jun 2014
Section: Full talk Technical level: Intermediate

Sumit Kumar

Supercharge Application I/O Performance with SSD caching

Storage I/O Performance plays a significant role in determining overall application end user response times and perceived user latency. How can you leverage solid state drives (SSD) to boost OLTP application I/O performance (for e.g., MySQL, MongoDB) in a holistic, non-disruptive and cost effective manner, without throwing away your hard disk but utilizing it for capacity? Through this talk, I’ll… more
  • 0 comments
  • Submitted
  • 15 Jun 2014
Section: Full talk Technical level: Intermediate

Sunil Sayyaparaju

Overcoming problems that you will face when trying to break speed limit

It is everyone’s continuous quest to improve the speed at which we do things. What is fast in the past is no longer fast. We need to continuously improve things. In those efforts, we face problems. We also get new opportunities because of the evolving technologies. This talk is to share our knowledge about the opportunities we used and how we overcame some of the problems. more
  • 1 comment
  • Rejected
  • 15 Jun 2014
Section: Full talk Technical level: Intermediate

Shalin Mangar

Scaling SolrCloud to a large number of collections

The objective of this talk is to share the challenges and learnings from setting up a large SolrCloud installation running on hundreds of nodes with thousands of collections and millions of users. This talk will also help people understand the guts of SolrCloud’s architecture. more
  • 0 comments
  • Confirmed & scheduled
  • 15 Jun 2014
Section: Full talk Technical level: Advanced

Shalin Mangar

How to deploy a 50 node SolrCloud cluster on AWS in 15 minutes

The objective of this short talk is to demonstrate the newly open sourced Solr Scale Toolkit which makes setting up and managing a SolrCloud cluster on AWS a snap. more
  • 0 comments
  • Rejected
  • 15 Jun 2014
Section: Crisp talk Technical level: Beginner

Rasagy Sharma

Using Data for Art

Why listen to a talk on art in a conference focused on technologies that power analytics? Because beyond the world of functional, need based & user-centered applications is a much more diverse world of data art. A field that has lesser constraints, and more opportunity for creative expression. Not only would it be exciting for any data enthusiast to see how technology & data are being used by the… more
  • 0 comments
  • Confirmed & scheduled
  • 16 Jun 2014
Section: Crisp talk Technical level: Beginner

Sunil Sayyaparaju

Getting your hands dirty with Aerospike

Aerospike is the new open-source NoSQL database. It is easily the fastest clustered NoSQL database solution. It is also well known for its operational easy-of-use. The objective of the workshop is to get your hands dirty with it. more
  • 0 comments
  • Confirmed & scheduled
  • 01 Jul 2014
Section: Sponsored workshop Technical level: Intermediate

Ambuj Singh

Real Time User-Scoring for Bidding in Display Retargeting

Retargeting online customers to a retail website via Display Ads has become an incredible avenue to drive traffic back to the website. Especially with the advent of Real Time Bidding (RTB), advertisers now have access to an efficient and transparent mechanism to buy from this huge volume of available ad inventory. It allows an advertiser to optimize their ad spend down to the exact user they are … more
  • 0 comments
  • Confirmed & scheduled
  • 03 Jul 2014
Section: Crisp talk Technical level: Beginner

subhajit sanyal

Large Scale Modelling and Analytics Challenges at a Payments Company

This talk first presents a broad overview of the Big Data challenges in a payments company. Then it discusses in details an application around modelling spend behavior of credit card holders. Through the application the talk demonstrates how various machine learning and data mining techniques are utilized to glean insights from petabyte scale data, and how one build practical models to solve real… more
  • 0 comments
  • Confirmed & scheduled
  • 04 Jul 2014
Section: Full talk Technical level: Intermediate

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more