Jul 2014
21 Mon
22 Tue
23 Wed 09:30 AM – 05:00 PM IST
24 Thu 09:45 AM – 05:00 PM IST
25 Fri 08:30 AM – 07:15 PM IST
26 Sat 08:30 AM – 07:15 PM IST
27 Sun
Accepting submissions
Not accepting submissions
Visualizing large data setsI’m going to showcase how to visualize large data sets, i.e. that have thousands to millions of data points. This goes beyond standard techniques like bar plot etc, and requires using tools like d3, Processing, ggplot2, circos and more. I will demostrate working samples, that I have created, using open source tools. Folks will gain an understanding of concepts, techniques and tools to create larg… more
Section: Full talk
Technical level: Intermediate
|
Serving user intent : Facebook style notifications using HBase and Event streamsThis talk is about building a low-latency, near real-time Notifications platform for serving user intent using Event based architecture, Complex Event Processing and a data store like HBase. Will also cover how millisecond response times are achieved when accessing data from 100 million rows by interpreting change from immutable events and organizing data as LSM trees. more
Section: Full talk
Technical level: Intermediate
|
Engineering custom visualisations with advanced d3.jsd3.js is a very complex library with a lot of functionality. That said, there are a lot of ready examples available on the Internet, which in turn promotes a culture of copy-paste-code. Hence, one ends up seeing recurring themes of the same charts - Sankey, Chord, Matrix, Force layout, etc. repeatedly. The objective of this workshop help a d3 developer truly harness the power of d3.js to make cus… more
Section: Workshops
Technical level: Advanced
|
ANALYTICS ON BIG FAST DATA USING REAL TIME STREAM DATA PROCESSING ARCHITECTURETo understand big data real time processing challenges, technology maturity on real-time/near-time analytics and modern big data architecture built with Hadoop more
Section: Full talk
Technical level: Intermediate
|
Circuitscape - A Case Study on Scientific ComputingIn this talk I would talk about some of the challenges faced in typical scientific computing applications and how to address them, taking Circuitscape as a case study. it would walk the audience briefly through what is possible through modern scientific computing platforms built on Python and Julia. more
Section: Full talk
Technical level: Intermediate
Session type: Lecture
|
How to Make Big Data Real and Valuable ...The Objective of the session is, Give the participants a very quick overview of the Data Landscape and Journey from Legacy System/Applications -> Data Integration -> ETL -> Data Warehouse -> Real Time Streaming and how all of this culminates in Big Data Architecture. more
Section: Crisp talk
Technical level: Intermediate
|
Developing Real-Time Data Pipelines with Apache KafkaThe audience would be benefitted in terms of understanding “A High-throughput distributed Messaging system”- KAFKA, which is developed used at Linkedin. more
Section: Full talk
Technical level: Advanced
|
Big Data in Telecom - Case studiesCover real world examples of Big Data Analytics in Telecom and how this is impacting current IT landscape. It would touch upon concepts of Digital Telco and Opportunities of Data Monetization by Telco companies with some real world examples. more
Section: Full talk
Technical level: Intermediate
|
What chemistry can teach us about designing better NLP algorithmsThe main idea behind this talk is how context is formed in language and how location, time, and order of words also has an effect on it. more
Section: Crisp talk
Technical level: Beginner
|
Crafting Visual Stories with DataData visualisation has enabled us to compress data and express them visually in many interesting new ways. It is often cited that we are trying to tell stories through them. But, the science of data-visual-stories is still very nascent and developing. On the other hand, the art of storytelling through spoken and written words, pictures, comics and movies is very well developed and understood. Let… more
Section: Full talk
Technical level: Beginner
|
Scaling with QueuesShare the experience of using queues based backend infra architecture for scalability, failover and data accuracy. more
Section: Full talk
Technical level: Intermediate
|
Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and EtcdIgor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I’ll explore the challenges faced and the architectural decisions that addressed them. I’ll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event p… more
Section: Full talk
Technical level: Intermediate
|
What would you recommend?This workshop will provide the audience with a quick overview of recommendation systems & how to build one from scratch. We shall build user-user collaborative filter (CF) based recommendation engines as well as item-item CF recsys. The audience will get a flavour of the range of statistical & mathematical computations that go into a recsys. more
Section: Workshops
Technical level: Intermediate
|
BDAS, the Berkeley Data Analytics StackThis talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab. more
Section: Crisp talk
Technical level: Beginner
|
'Know Your Customer!' - Advanced Data Science for Audience SegmentationHave you ever wondered how Cisco does Customer Segmentation? What is Cisco’s technology stack to deal with Big Data? What tools and technologies are adopted to bring best-of-breed algorithms from data science to inform on the problem of identifying segments in the audience. How does supervised and semi-supervised machine-learning along with Bayesian predictive analytics combine to produce a very … more
Section: Full talk
Technical level: Advanced
|
Why we built the most adopted Polyglot Object Mapper for NoSQL?The talk would narrate the story of building the most adopted Polyglot Object Mapper for NoSQL, Kundera (https://github.com/impetus-opensource/Kundera). Kundera is a High Level Client / Object Mapper with a JPA interface for working with RDBMS and NoSQL Datastores. It can be considered as a Hibernate equivalent for NoSQL Datastore. more
Section: Full talk
Technical level: Intermediate
|
Apache Pig Power toolsThe objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage. more
Section: Workshops
Technical level: Intermediate
|
Building distributed search applications using Apache SOLRThe objective of this workshop is to introduce attendees with most common features of a search application and how to implement them using Apache Solr. The workshop will also cover how to scale the application by leveraging SolrCloud. more
Section: Workshops
Technical level: Beginner
|
Spot the model hiding in the Big DataThis talk is intended to help businesses avoid expensive incorrect decisions based on poor understanding of the underlying models. In this talk I shall discuss ways to understand a phenomenon by triangulating across visualizations, underlying model understanding and experimentation. more
Section: Full talk
Technical level: Beginner
|
Extending Vega - A visualisation grammar to create interactive visualisationsI want to present the work I am doing in extending a visualization grammar Vega (http://trifacta.github.io/vega/) more
Section: Crisp talk
Technical level: Beginner
|
Realizing Large-scale Distributed Deep Learning Networks over GraphLabThe main objective is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives of the talk can be summarized as below: more
Section: Full talk
Technical level: Intermediate
|
Storing relationships in large data-sets using GraphsProblem Statement - Fast Programmatic/self-serve analytics on linked data in an ad system by indexing it across all cuts, especially for traversals like - more
Section: Crisp talk
Technical level: Advanced
|
Unified analytics platform for BigdataThis talk is about a system developed at InMobi to support OLAP data cubes on top of Hive metastore. With this abstraction, users can reference single schema and data stored across diverse storage engine and that users can query data on the logical tables without knowing about schema details like relationships, rollup levels, data location and data types. more
Section: Full talk
Technical level: Intermediate
|
Experimentation to Productization : developing a Dynamic Bidding system for a location aware Mobile landscapeThis session is to help structure a Hypothesis based approach to Engineering problems and learning to quickly translate & implement algorithms on weblogs(mobile footprints) data. more
Section: Full talk
Technical level: Intermediate
|
Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)Assume that you got an opportunity to work with vast amount of unstructured and semi-structured text data in a specific domain e.g. automobiles, agriculture, medical, internet, etc. Your task is to derive business value out of this textual data by extracting a domain-specific knowledge graph (DKGraph) and employing it for various business use cases. In this problem, there are several key challeng… more
Section: Full talk
Technical level: Beginner
|
Using Elasticsearch for AnalyticsAt Wingify, we have built a system to process and store analytics data for our customers, which they can use to slice and dice the data to make more meaningful reports. This talk is about how we solved this problem and how we used Elasticsearch to solve this problem at our scale rather quickly. Audience will take away some of the data problems they can quickly solve with Elasticsearch. more
Section: Full talk
Technical level: Intermediate
|
Scaling real time visualisations for Elections 2014How does one go about creating interactive real-time visualisations with rapidly changing data? This talk is about our experiences in designing the CNN-IBN and Bing election results page. more
Section: Full talk
Technical level: Intermediate
|
Hive and Presto for Big Data Analytics in the CloudThe objective of this talk is to conceptualize the use of Hive and Presto for big data analytics. We will contrast their architecture and use cases, and describe how to take advantage of both these technologies in the cloud. more
Section: Full talk
Technical level: Intermediate
|
How to build a Data Stack from scratchThis talk will cover a framework for thinking about the analytics data stack. What are the things to consider when building a data stack from scratch. How to choose the right software for your stack whether it is visualisation, analytics or storage ? It will talk about the relations between different techniques for extracting insights outs of raw data. I will draw upon examples from my experience… more
Section: Full talk
Technical level: Intermediate
|
De-dup @ Scale : Experiments with DynamoDBWhat should you know if you want to integrate DynamoDB into your BigData application ? more
Section: Full talk
Technical level: Intermediate
|
Lambda ArchitectureEducate and discuss on principles and best practices to build large scale data processing architectures. Introduction to “Lambda Architecture” proposed by Nathan Marz (Storm guy) more
Section: Full talk
Technical level: Intermediate
|
Apache Tez: Accelerating Hadoop Data PipelinesApache Tez is a DAG execution engine which exists as a super-set of traditional Map Reduce. Tez designed as a replacement computational model for nearly everything that currently uses map-reduce. more
Section: Full talk
Technical level: Beginner
|
Live analytical dashboards at scale - SQL styleHow to build a real-time, analytical dashboads that can enable business take decisions at scale? There are various technologies out there that fill one or the other use case - right from horizontally scalable queues such as kafka, stream processing systems such as storm, data stores such as openTSDB and druid that can provide dimensional lookup on large amount of data and visualisation libraries … more
Section: Full talk
Technical level: Intermediate
|
Tailor made stores at myntra or how to personalize your search resultsThis will showcase a unique way of personalization which is a combination between search and recommendations. Here we’ll not go into the details of the algorithms of how a product is deemed suitable for a user but given that the product has been shortlisted on certain criteria how to show case that. The talk will give some idea of Cassandra and Solr. more
Section: Crisp talk
Technical level: Intermediate
|
Machine learning at scale with SparkTake the audience throught my journey of learning machine learning from scarath using various freely available resources and building applications it on the big data using Apache Spark and MLLib. more
Section: Full talk
Technical level: Beginner
|
Machine Learning using R : Crash course in Classification MethodsThe aim is to provide the attendees with an overview (implementation-wise) of some of the major classification methods using R. The focus of the workshop will be on breadth rather than depth. A lot of methods will be introduced, but their mathematical properties won’t be discussed in detail. more
Section: Workshops
Technical level: Beginner
|
Machine learning + Interactive visualization: A pragmatic approach to fixing knowledge basesWe wish to explore how the use of recommenders and visualization can help in fixing problems inherent to knowledge bases. We will tackle one such problem which is incorrect/missing assignment of tags to articles in a knowledge base. We will also demonstrate how off-the-shelf software in the Hadoop ecosystem could be used to improve the richness of this data through processing and visualization. W… more
Section: Full talk
Technical level: Beginner
|
Advanced Big Data Analytics using Apache Mahout and GiraphIt is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-proces… more
Section: Workshops
Technical level: Advanced
|
Scaling Spatial Data - OpenStreetMap as Infrastructure.For the success of any location service, the length and breadth of geographic relationships have to be recorded with enough room for frequent verification and classification. This talk will introduce the infrastructure behind the largest open geographic data repository - OpenStreetMap - and how you can leverage the complete geospatial stack for independent data collection, verification, and build… more
Section: Full talk
Technical level: Intermediate
|
The ART of Data Mining - Practical Learnings from Real-world Data Mining applicationsMachine Learning and data mining is part SCIENCE (ML algorithms, optimization), part ENGINEERING (large scale modeling, real-time decisions), part PROCESS (data understanding, feature engineering, modelling, evaluation, and deployment), and part ART. In this talk we will focus more on the “ART of data mining” - the little things that make the big difference in the quality and sophistication of ma… more
Section: Full talk
Technical level: Intermediate
|
Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.In this talk I will try to bring some new concepts that will help data scientists to run their predictive algorithms on hadoop with the help of PMML and cascading. more
Section: Full talk
Technical level: Intermediate
|
Fast Elephant - the Cheeliphant (Cheetah-Elephant)!In this talk I shall share the spectrum of technologies and the evolution of the Big Data and Analytics space and its associated infrastructures. I shall also touch upon the tips and traps of using these infrastructure and useful thumbrules for designing systems. more
Section: Full talk
Technical level: Beginner
|
Migrating traditional warehouse and its applications to a Big-data platformUnderstanding the capabilities/limitations of Hadoop platform for efficient migration more
Section: Full talk
Technical level: Intermediate
|
Real Time Secure API delivering data @ scaleAt ThoughtWorks, we have used a Hybrid Approach for designing a Real Time secure API, which gives various adhoc querying capability on large amount of data. more
Section: Crisp talk
Technical level: Beginner
|
Filtering the noise from an avalanche of Google Analytics Metrics : Anomaly DetectionAt Tatvic, we have built an Anomaly Detection Engine that alerts the analyst about sporadic changes in Google Analytics metrics. Additionally, the analyst can also drill down into the possible root causes of the anomaly which enables him to take quicker business decisions. more
Section: Crisp talk
Technical level: Intermediate
|
Using Cascalog and Clojure to make the elephant move!Intent is to highlight benefits gained by using Clojure, a functional language which works on JVM and Cascalog data processing library for Hadoop. The participants will be exposed to, more
Section: Crisp talk
Technical level: Intermediate
|
Analytics on Large Scale, Unstructured, Dynamic Data using Lambda ArchitectureIn this talk, I will focus on our experience in using Lambda Architecture at Indix, to build a large scale analytics system on unstructured, dynamically changing data sources using Hadoop, HBase, Scalding, Spark and Solr. more
Section: Full talk
Technical level: Intermediate
|
Latest trends in Market Mix Modeling & a unique way of making measurement & optimization more effectiveLearn a new way of doing MMX modeling and challenge the traditional way of doing it in your organizations. Apply the same principles to all your other analytics problems. more
Section: Crisp talk
Technical level: Advanced
|
Data sciences (is) in fashion @ MyntraEver dreamt that you can walk into a store which has been designed just for you? A store where the shelves have been stacked keeping in mind your fashion preferences only. A sales rep who understands what you wear and what’s missing in your wardrobe. Myntra is fast transforming itself into such a hyper-personalized (1:1) store and this transformation is being powered solely through analytics over… more
Section: Full talk
Technical level: Intermediate
|
Lessons from Elasticsearch in productionThis talk is for people who are planning to use Elasticsearch in their next project. more
Section: Full talk
Technical level: Intermediate
|
The state of Julia - a fast language for technical computingLast year at the Fifth Elephant, I gave a talk introducing the Julia programming language (http://www.julialang.org/). This year, I propose to give a short talk on the current state of Julia. more
Section: Crisp talk
Technical level: Intermediate
|
Real world machine learningWe will become familiar with real world machine learning in a hands on, intuitive way. Rather than taking the algorithm and its results as a black box provided by a library and learning in a cookbook style, we will try to understand the why of the problem. Participants will also appreciate the importance of each phase (data exploration, data cleaning and extraction, modeling, evaluation) of machi… more
Section: Workshops
Technical level: Intermediate
|
Twitter data collection framework for dummies.This talk is about how I got 200 odd GB of tweets over a 45 day period to build a Trend Summarizer. I chose to build this as part of the dissertation for my MS Programme. The main objective here was to fetch tweets belonging to a trend in different locations. Additionally, I wanted this to be scalable out of the box i.e. if I increased the number of locations to look for, It shouldn’t run into pr… more
Section: Full talk
Technical level: Beginner
|
Interactive analytics on event streams with complexly nested schemasIn this talk, I will share the lessons that we learnt while building an application for interactively analyzing data from event streams like twitter firehose, click streams, and application logs with complexly nested schemas. I will discuss the challenges faced while implementing the whole analytics stack that has Kafka for data collection, Elasticsearch for realtime search, and Apache Drill for … more
Section: Full talk
Technical level: Intermediate
|
big data analytics with machine learningWe crunch the numbers and turn your data into accessible and intuitive visuals that you can use for presentations, information sharing, and overall company transparency and most importantly for analysis. Predictive analytics has been closely linked to big data. Based on the patterns of how your variables have behaved over time, our machine learning algorithms can predict how the same variables ar… more
Section: Crisp talk
Technical level: Beginner
|
De-dup on HadoopIn this talk, I wish to share experiences we had at Intuit in building Master Data Management solution on Hadoop platform. At the core MDM solution consists of fuzzy matching, entity resolution and de-duplication. Solving these patterns on Big Data Platform like Hadoop is the focus of this discussion. more
Section: Crisp talk
Technical level: Beginner
|
Dr. Hadoop – Diagnose your Hadoop JobsHave you faced a problem where you run a job or query on hadoop, which runs very slow, and you have no clue why? You look at your job details on jobtracker and get confused with hundreds of counters and configurations? You really don’t know how to make sense out of it. This is a very common challenge for hadoop beginners specially the analysts or the people coming from RDBMS world. This talk is a… more
Section: Crisp talk
Technical level: Intermediate
|
Big data in financeThe talk will cover a case study of solving a research problem in algorithmic trading using high frequency data from a stock exchange. more
Section: Full talk
Technical level: Intermediate
|
Supercharge Application I/O Performance with SSD cachingStorage I/O Performance plays a significant role in determining overall application end user response times and perceived user latency. How can you leverage solid state drives (SSD) to boost OLTP application I/O performance (for e.g., MySQL, MongoDB) in a holistic, non-disruptive and cost effective manner, without throwing away your hard disk but utilizing it for capacity? Through this talk, I’ll… more
Section: Full talk
Technical level: Intermediate
|
Overcoming problems that you will face when trying to break speed limitIt is everyone’s continuous quest to improve the speed at which we do things. What is fast in the past is no longer fast. We need to continuously improve things. In those efforts, we face problems. We also get new opportunities because of the evolving technologies. This talk is to share our knowledge about the opportunities we used and how we overcame some of the problems. more
Section: Full talk
Technical level: Intermediate
|
Scaling SolrCloud to a large number of collectionsThe objective of this talk is to share the challenges and learnings from setting up a large SolrCloud installation running on hundreds of nodes with thousands of collections and millions of users. This talk will also help people understand the guts of SolrCloud’s architecture. more
Section: Full talk
Technical level: Advanced
|
How to deploy a 50 node SolrCloud cluster on AWS in 15 minutesThe objective of this short talk is to demonstrate the newly open sourced Solr Scale Toolkit which makes setting up and managing a SolrCloud cluster on AWS a snap. more
Section: Crisp talk
Technical level: Beginner
|
Using Data for ArtWhy listen to a talk on art in a conference focused on technologies that power analytics? Because beyond the world of functional, need based & user-centered applications is a much more diverse world of data art. A field that has lesser constraints, and more opportunity for creative expression. Not only would it be exciting for any data enthusiast to see how technology & data are being used by the… more
Section: Crisp talk
Technical level: Beginner
|
Getting your hands dirty with AerospikeAerospike is the new open-source NoSQL database. It is easily the fastest clustered NoSQL database solution. It is also well known for its operational easy-of-use. The objective of the workshop is to get your hands dirty with it. more
Section: Sponsored workshop
Technical level: Intermediate
|
Real Time User-Scoring for Bidding in Display RetargetingRetargeting online customers to a retail website via Display Ads has become an incredible avenue to drive traffic back to the website. Especially with the advent of Real Time Bidding (RTB), advertisers now have access to an efficient and transparent mechanism to buy from this huge volume of available ad inventory. It allows an advertiser to optimize their ad spend down to the exact user they are … more
Section: Crisp talk
Technical level: Beginner
|
Large Scale Modelling and Analytics Challenges at a Payments CompanyThis talk first presents a broad overview of the Big Data challenges in a payments company. Then it discusses in details an application around modelling spend behavior of credit card holders. Through the application the talk demonstrates how various machine learning and data mining techniques are utilized to glean insights from petabyte scale data, and how one build practical models to solve real… more
Section: Full talk
Technical level: Intermediate
|