Jul 2016
25 Mon
26 Tue
27 Wed
28 Thu 08:30 AM – 06:25 PM IST
29 Fri 08:30 AM – 06:15 PM IST
30 Sat 08:45 AM – 05:00 PM IST
31 Sun 08:15 AM – 06:00 PM IST
Accepting submissions
Not accepting submissions
Let your Big Data Processing take flight with Apache FalconAt InMobi, a mobile advertising company, we see events arriving in excess of 10 billion per day. Analysis, reporting and inferencing from these requests (and responses served) is key to serving the right ad, to the right person, at the right time. We have nearly 200 complex big data pipelines that run against various data sources. Managing so many pipelines and the associated data was becoming a … more
Section: Crisp talk
Technical level: Beginner
|
Real-time Ingestion of logs into Hive with a low latency, to query and respond to eventsThreat landscape is changing very rapidly and we are seeing more and more targeted attacks. Detecting such attacks requires a data driven approach, which requires processing PBs of telemetry data (AV detections, system access logs, network statistics etc.) received from end points, firewalls, gateways etc. more
Section: Crisp talk
Technical level: Intermediate
|
Long Running Services on YARN: Future of Service Deployment & Management via HadoopYARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first disc… more
Technical level: Advanced
|
Timely DataflowMany data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed. Timely Dataflow is the computational model that addresses these challenges as an unified systems as suppose to bolting batch & stream processing system together. It is first presented as part of Naiad (SO… more
Section: Crisp talk
Technical level: Advanced
|
Increasing Trust and Efficiency of Data Science using dataset versioningAs data science grows and matures as a domain, harder questions are being asked by decision makers about trust and efficiency of data science process. Some of them include: more
Section: Crisp talk
Technical level: Intermediate
|
Emerging patterns of lifestyle impact on personal health & wellnessLifestyle is changing at a very rapid pace as we enter the internet era. Pace of evolution in terms of technology, lifestyle, work environment, etc. is more rapid than ever before and has resulted in how our lifestyle and health has changed. To be able to understand the new health and wellness patterns emerging, and help a preventive health care based start-up design improved solutions to help pe… more
Section: Crisp talk
Technical level: Beginner
|
Model VisualisationThough visualisation is used in data science to understand the shape of the data (data-vis), it is not widely used for the models developed; which are largely evaluated based on numerical summaries. Model visualisation (model-vis) can help understand: the shape of the model, the impact of parameters & different input data on the model, the fit of the model & where it can be improved. more
Section: Full talk
Technical level: Beginner
|
What do machine learning and high performance computing have to do with big cats in the wild?Science has played a crucial role in our understanding of big cats in the wild and in their conservation. When we focus on the aspect of “gaining knowledge” or “learning”, few other approaches have done better than rigorous application of scientific methods. As we all know too well, the scientific method involves careful observation, construction of relevant theories and confronting these theorie… more
Section: Full talk
Technical level: Intermediate
|
Statistical Models for Better Customer EngagementWe look at the various stages of a sales/marketing funnel, and see how data science can be used to improve effectiveness of the processes, understand what the customer wants, and discover new ways of engagement in each stage. We discuss the statistical models, the business metrics they drive, and share real life examples from our experience. more
Technical level: Intermediate
|
Big Data StructuresAnalysis of terabyte data sets by heavy data processing are common tasks these days. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. For Big Data, the computer changes to a cluster and also the way of organizing the data is distributed. The usage patterns are changing from being precise changes to being probabilistic. False positive matche… more
Section: Full talk
Technical level: Beginner
|
Unified & Distributed Test Infrastructure at Scale (Hortonworks Data Platform Testing)Extensive software testing is required before the actual release to ensure the software quality and the software has to perform equally well in every platform and combination of configurations. When it comes to a data platform, the testing is even more complicated due to variety of clusters, storage layers, operating systems, JDK versions, data base flavors, execution engine, security config, com… more
Section: Crisp talk
Technical level: Intermediate
|
Taking Fashion and Lifestyle Commerce Towards SKUs Using Deep Image and Text ParsingIn this talk, I will describe challenges, insights, innovations and experiences in building a large-scale deep learning system to prepare SKUs (Stock Keeping Units) for millions of fashion products. more
Section: Full talk
Technical level: Intermediate
|
Dr. Elephant - Self-Serve Performance Tuning for Hadoop and SparkHadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the … more
Section: Crisp talk
Technical level: Intermediate
|
Designing Data ProductsComing up with a good model is very important for any machine learning system. But to build a good data product, there are a bunch of other things that goes along with the model. The focus of this talk will be to discuss those things and share our learnings and recommendations based on our experience. more
Technical level: Intermediate
|
(Workshop) Understanding neural networks by building few from scratchI have a firm belief that, there’s elegant and understandable theory behind neural networks. more
Section: Workshop
Technical level: Intermediate
|
Visually reading the configuration of a Rubiks cube using Probabilistic Graphical ModelIdentify the edges in the field of view and then correlate the sequence of frames to infer the configuration of the rubiks cube. The audience will be able to take away as to how one can correlate information from video frames to infer the kinematics of the object in the field of view more
Technical level: Intermediate
|
Forecasting the degradation of Network KPIsIn this talk, We present a methodology to predict network degradation in the telecom sector. We will be explaining how to forecast degradation of network key performance indicators (KPIs) and providing (24 Hrs. in advance) alerts to network operations team to take preemptive actions before degradation affects network performance more
Section: Crisp talk
Technical level: Intermediate
|
Machine Learning - DemocratizedMachine Learning is no more a science for data scientists and data engineers, the cloud based machine learning services have democratized the entire process of Machine learning, right from the Data science to the data engineers to the data visualization. You no longer need to be an expert in either to take a taste of Machine learning or see how it works. The cloud based ML options even allow you … more
Section: Full talk
Technical level: Beginner
|
Purpose, Speed & Visibility : Facilitating product discovery & engagement on a e-commerce websiteEach product on an ecommerce website has an opportunity to sell and market dynamics determines what’s selling and at what speed . This has Merchandising implications for stock re-fill, flash sales, promotions & special events - along with the actions a merchant’s platform team takes in anticipation for such events. By reverse engineering this quantitatively, and tuning the proprietary Search rank… more
Section: Full talk
Technical level: Intermediate
|
Artificial Intelligence for Efficient Financial MarketsArtificial Intelligence (AI)! This is not just the name of the 2001 Spielberg movie! It is also the field of study to create machines capable of intelligent behavior. more
Section: Crisp talk
Technical level: Intermediate
|
Discovering App Relationships in Smart Phones through Large Scale Mining of User Journey DataUser experience while navigating through home screen and apps is a key differentiator for any smart phone. Building a user interface giving ease of use and personalized and contextualized home screen requires deep understanding of how different users are using their phones. Mobile OEMs periodically collect application usage data from millions of smart phone users. Analyzing this massive amount of… more
Section: Full talk
Technical level: Intermediate
|
Interactive data transformations at scaleOne set of ETL tools allows building ETL pipelines for large datasets, however these tools do not provide data-level interactivity. There’s another set of data-prep tools that allow interactive data transformations, however only for a single table (or for datasets that can fit in the memory of a single machine). The challenge is to provide the best of both worlds - interactive data transformation… more
Section: Sponsored
Technical level: Beginner
|
High performance computing using SparkSpark has revolutionized the way Big data computation are done. It provides efficient way of distributed data processing computation. In this session, I will cover our experience of implementing a large scale big data platform (> 100 TB) using Spark and challenges faced/lessons learned more
Technical level: Intermediate
|
Security Analytics at Web Scale• What is Security Analytics • How Symantec discovers risks and weaknesses in Enterprises more
Section: Full talk
Technical level: Intermediate
|
Logging at scale using Graylog - Billion+ messages, 100K req/secWith the advent of micro-services, dozens of releases per day, logs are the bread and butter for a successful real-time technology platform like OlaCabs. In this talk, I would be presenting our logging pipeline and the challenges we faced while doing it at Ola scale. more
Section: Crisp talk
Technical level: Intermediate
|
Machine Learning Application in MicroFinanceArtoo is a loan origination system (LOS), our aim is to improve the financial inclusion in world (starting with India). As a testament to our mission we have help disbursed 1 Lac loans worth 1,000 crores (last two years), we wanted to share our experience of using data and eventually data science in helping our clients take the right call while disbursing loans. more
Section: Crisp talk
Technical level: Beginner
|
Sensor Analytics for IoT and Embedded SystemsAnalytics-driven embedded systems are here! We’ll show this in action by classifying human activity in real-time using sensor data from a smartphone accelerometer. The demo will show a complete workflow: – pre-processing with digital filtering and frequency analysis, – exploring different classification algorithms (such as decision trees, support vector machines, or neural networks), and – automa… more
Section: Crisp talk
Technical level: Intermediate
|
Data-Driven Decision Making in Indian Agriculture: the Present and the FutureData-driven decision making is critical in sectors like agriculture, health, and education where well-planned initiatives have the power to literally change lives. Lack of a consolidated platform with access to relevant data, however, hinders objectivity and efficiency in the decision making process for the decisions that matter most. In this session, we reveal how we integrated relevant data — p… more
Section: Crisp talk
Technical level: Intermediate
|
Knowledge Inference: Estimating how much the student knowsVery high student-teacher ratios, lack of infrastructure and other socio-economic issues have affected quality and accessibility of education significantly. Moreover, Education can also benefit from the potential and promises of technology (particularly AI), which has already transformed our lives in many aspects. An Intelligent Tutoring System (ITS) is a computer system which enables learning in… more
Section: Full talk
Technical level: Intermediate
|
Reducing the world with JavaScriptThe Earth is a staggering dataset. OpenStreetMap is the largest living open map of the world with a collection of over 1B mapped roads and ~2B mapped buildings. Processing this massive dataset can lead to a lot of interesting analyses about the world, but can also be really slow - enter the open source TileReduce module. more
Section: Full talk
Technical level: Intermediate
|
Predicting Corporate Bankruptcy by mining financial reports and regular transactional trends combining with Investor sentiment analysisBankruptcy is one of the major concern for any type of market. If any company fall and loses money it’s a damage to a part of economic environment. Prediction of Bankruptcy has become important with time as it helps in mitigating risk by the organization as well as the current standing government. This short talk will walk you through how Machine Learning is changing the world of finance especial… more
Section: Crisp talk
Technical level: Intermediate
|
Sentiment analysis to evaluate the performance of Fund ManagersGlobal Assets under Management (AUM) is estimated to be 64 Trillion USD across the globe. Investment Managers are the key players in this business who make investment decisions on behalf of investors. What are the tools the financial services companies have to evaluate the performance of these managers? There are tremendous amount of data available for the underlying financial instrument, be it m… more
Section: Crisp talk
Technical level: Beginner
|
Apache Drill - Optimising Time to marketData is more than doubling up every year. With semi-structured data growing at a much larger pace than structured data and data flowing from different sources having different data types, much of one’s time is wasted in defining schemas and transformations. Often, the schemas are unknown upfront, as datasets are evolving in highly dynamic ways. And current systems are unable to let us query dynam… more
Section: Crisp talk
Technical level: Intermediate
|
ML in fin-tech - Transforming 60 crore Indian livesI lead Finomena, which uses the power of big-data, AI and ML in every imaginable way (information retrieval, NLP, deep learning, social network analysis, fraud detection and prevention, image recognition (even from videos), speech to text transcription and analysis, reinforcement learning) on a daily basis to provide access to credit to people in the long tail in India - over 60 crore people who … more
Section: Full talk
Technical level: Beginner
|
Data pipelines - Cakewalk with Docker and LuigiModern data driven products are powered by pipelines of data processing tasks. Building this infrastructure requires a lot of boiler plate code. Moreover deploying these tasks consistently accross development to production environment, and maintaining resource isolation can cause longer development cycles. Maintaing different versions of datasets and tracking improvement of your model on these ve… more
Technical level: Advanced
|
Recommender Engines : A Peak into Predictive AnalyticsThe growth of data at exponential rates isn’t news today. Social media and e-commerce platforms are major contibuting factors to this story. With billions of users online, the potential for marketing and reach is immense. Recommender engines are utilized across domains to assist users make the right choices by understanding their behaviour and tastes. more
Section: Full talk
Technical level: Beginner
|
Challenges in Data Warehouse Augmentation on HadoopEnterprises these days are finding value in moving their traditional data warehouses into augmented and historical data stores on Hadoop. This requires continuous data synchronisation between traditional data warehouses and data on Hadoop. It is also added advantage to maintain slow changing dimensions of data when it is ingested onto Hadoop from traditional database systems. Once this data is av… more
Section: Sponsored
Technical level: Intermediate
|
Four horsemen of the IoTMQTT brokers have been around for quite a bit. But never before has there been so much active development for IoT cloud providers. Silicon is cheaper than ever. IoT, especially industrial, is now feasible for even small and medium sized enterprises with lower margins. more
Section: Full talk
Technical level: Intermediate
|
Exploit conceptual data models using ontology modelingWe will introduce the audience to a different way of modeling data. And demonstrate creating an Ontology model using structured and unstructured content. more
Section: Crisp talk
Technical level: Beginner
|
Continuous online learning for classification tasksAt Airwoot (now acquired by Freshdesk), we model NLP-based margin-based classifiers to filter spam from relevant customer tweets/post on social media. We work with the language of social, and this introduces a challenge of continuously adapting our models to the change in social verbiage. The language of social is dynamic with new hashtags, acronyms and induced spelling mistakes forcing us to upd… more
Section: Full talk
Technical level: Intermediate
|
Data Simulation as a means to intuitively grasp Statistics and it's direct application to prediction problemsWhenever there is data, there is meta-data about the data itself characterised in the form of Statistics. more
Section: Full talk
Technical level: Beginner
|
Introduction to Statistics and Basics of Mathematics for Data Science - the hacker's wayA lot many of us decided Math was our reckoning in our high school and ended up studying highly quantitative fields like engineering and computer science and some of us even specialized further with a Masters, including MBA. And yet here we are, a few years into our career and suddenly realizing the math basics isn’t as strong as what we thought it should have been. more
Section: Workshop
Technical level: Beginner
|
Leveraging Streaming Systems for Machine LearningLarger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming syste… more
Section: Crisp talk
Technical level: Intermediate
|
RNNs for multimodal information fusionData generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. I will discuss a novel generalized deep neural network architecture where temporal streams from multiple modalities can be combined. The hybrid Recurrent Neural Network (RNN) exploits the c… more
Section: Crisp talk
Technical level: Intermediate
|
Distributed Computing Abstractions for Big Data ScienceThe data science field has made significant advances in the last few years, with a renewed focus on getting data science to work at scale. The talk shall outline distributed computing abstractions required to realize data science at scale. The Resilient Distributed DataSet (RDD) abstraction provided by Spark is becoming a de-facto approach for big data science. However, Apache Flink and recently,… more
Section: Full talk
Technical level: Intermediate
|
Don’t just build a data lake, build data powerhouse.Companies are now trying to become data oriented and trying to take decision based on data. more
Section: Full talk
Technical level: Intermediate
|
Distributed change data capture platformThe speed of today’s processing systems have moved from classical data warehousing batch reporting to the real-time processing and analytics. RDBMS (OLTP) data is one such type of data required for analysis and deriving business insights. Traditional way of ingesting RDBMS data into analytical system (hadoop etc.) is via bulk import or query based ingestion. This approach has following issues more
Section: Full talk
Technical level: Intermediate
|
Intuit’s Data journey to Public cloudCloud adoption has now entered the “early mainstream” stage as enterprises increasingly look to cloud deployment as a viable model for agile, cost-effective IT delivery. However, the prevailing binary paradigm of cloud infrastructure (public versus private) limits the extent to which enterprises can fully leverage the on-demand, self-service, elastic resource provisioning attributes of public clo… more
Section: Crisp talk
Technical level: Intermediate
|
How Intuit solved big scan problem in real timeIntuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings… more
Section: Crisp talk
Technical level: Beginner
|
Building a scalable Data Science Platform ( Luigi, Apache Spark, Pandas, Flask)“In theory, there is no difference between theory and practice. But in practice, there is.” - Yogi Berra more
Section: Workshop
Technical level: Intermediate
|
Building a Large scale Augmented classifier ensemble to predict in noisy dataDifferent types of classifiers were investigated in the context of classification of problem tickets in the Enterprise domain. There were still challenges in building an accurate classifier post data cleaning and other accuracy improving pre-processing techniques. Creating an ensemble of classifiers gave better accuracy than individual classifiers. The maximum accuracy was got by enhancing the en… more
Section: Full talk
Technical level: Advanced
|
RightFit- A Data Science Approach to Reduce Product Returns in Fashion e-CommerceFashion e-commerce industries experience a lot of product returns (or exchange) from customers. Most of these are attributed to incorrect size (or fitment). The talk will focus on this problem and present a solution to reduce such returns. Specifically, we present a data science driven approach to profile our customers based on their past purchases and returns and use that to recommend the right … more
Section: Crisp talk
Technical level: Intermediate
|
Bootstrapping inspired by Hacking Human CognitionSeveral applications of Machine Learning are hamstrung by the a vicious cycle. more
Section: Crisp talk
Technical level: Intermediate
|
Looking under the hood - demystifying data toolsThe goal of this talk is to help build an understanding of the performances of the following packages - R Dataframe R data.table Pandas Numpy PySpark RDDs PySpark Dataframes RedShift While these packages are operating in different but intersecting realms of use cases, depending on the cardinality of the data and the operations that will be performed on it, some are more suited than others for the… more
Section: Crisp talk
Technical level: Intermediate
|
Deep Learning for Computer VisionOne of the fields that have benefited the most from the rise of Deep Learning has been Computer Vision. The goal of this workshop is to have participants go from the basics to tackling a problem that might solve a real world problem. more
Section: Workshop
Technical level: Intermediate
|
Scalable Realtime Analytics using DruidTraditional SaaS solutions based on hadoop datastore Hive/Hbase or classical RDBMS work well for storing data, although they are not optimized for ingesting data and making it immediately available for interactive ad-hoc low latency queries at a very high scale. Long query latencies make these solutions suboptimal choices to power interactive applications. This talk will introduce Druid as a comp… more
Section: Full talk
Technical level: Intermediate
|
Advanced Deep Learning Workshop – Hands-onDeep Learning is a hot topic, but has a steep initial learning curve. This workshop is aimed at giving participants ‘hands-on’ experience of a range of deep learning techniques. more
Section: Workshop
Technical level: Advanced
|
Convolutional Neural Networks from the Other SideDeep Learning has made lot of progress in the last four years: more
Section: Full talk
Technical level: Advanced
|
The Alternative Data revolution on Wall StThis talk will focus on the role that non-traditional data research, known as alternative data, is beginning to play across the investment community. We will address how datasets such as point of sale transactions, web site usage, municipality records, social media data and similar information are being utilized by traditional long-short funds, quantitative hedge funds and also mutual funds. more
Section: Full talk
Technical level: Intermediate
|
Taking Analytics Applications from Labs to the Real World: Transfer Learning in PracticeTraditional supervised learning models’ performances degrade if “nature” of test samples differ from that of training samples. For example, a classifier built to discriminate between “books” with positive, negative and neutral reviews when applied to discriminate between “kitchen products” into the same set categories, its performance drops. This relates to one of the fundamental probably approxi… more
Section: Full talk
Technical level: Intermediate
|
Machine Learning the Walmart Way with a Deep Dive into Automated Forecasting SystemWalmart, the largest retailer also has one of the largest data, with petabytes of data created every day. The world is moving to a more data driven decision making ecosystem and building machines that can take those decision. Hence effective management of the data and analysis in a human independent manner is the need of the hour. more
Section: Crisp talk
Technical level: Intermediate
|
Lessons Learned : Real-life NLPBuilding a practical Natural Language Processing system goes far beyond installing an open source toolkit. I will give an overview of some of the components required, and obstacles that have to be overcome for a system that extracts entities and relationships from full-text documents. more
Section: Crisp talk
Technical level: Intermediate
|
Meet the needs of content marketing with the power of NLPContent Marketing is one of the recent buzz in the space of digital marketing. Content Marketing broadly refers to focusing on providing quality and useful content to customers for engaging and attracting customers towards a brand. With the proliferation of channels where these content can potentially be delivered, there is an increasing demand from content writers to provide content that can be … more
Section: Full talk
Technical level: Intermediate
|
Hadoop & Cloud Storage: Object Store Integration in ProductionToday’s typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with the… more
Section: Crisp talk
Technical level: Intermediate
|
Deciphering Driving Behaviour using Geospatial Temporal Data Collected from Smartphone SensorsOur vision at Zendrive Technologies is ‘Safer Drivers, Safer Roads’. To that end, we collect data from a variety of sensors available on smartphones, and combining techniques from signal processing, statistical modeling and geographical information systems (GIS) we detect events pertaining to driving and characterize one’s driving style. more
Section: Full talk
Technical level: Intermediate
|
Hierarchical Bayes Approach and Implementation of MCMC in an Ecological StudyThe Bayesian paradigm for analysing data has gained unmatched popularity at most of the fields of statistical application in the late twentieth century. Bayesian methods permits one to construct statistical models by simultaneously using the current data and all the prior information on hand to make inference about the unknown nature of the underlying process, in a marvellously simple way. But th… more
Section: Full talk
Technical level: Advanced
|
Real Time Fulfilment Planning at Flipkart ScaleFlipkart.com stores and sells millions of unique items through its Fulfillment Centers (FCs) and Sellers. These items need to be picked from FCs or need to be shipped from tens of thousands of Sellers into the many Sortation Centres in the Flipkart network. We need different quantities of each of these items, we need to pick them up from the FCs or Sellers at different times, and bring it into th… more
Section: Full talk
Technical level: Intermediate
|
Allocation and Forecasting in Guaranteed Delivery of AdvertisementsGuaranteed delivery (GD) of advertisements helps brands book advertisement views of niche audience segments well in advance. To enable this, we need to create an intelligent system which allows for targeting of users, forecasting supply, optimally booking campaigns, allocating campaigns to users, pricing the guarantees and penalties correctly. more
Section: Full talk
Technical level: Intermediate
|
Scaling the Largest Functional DataSet @Flipkart aka CatalogCatalog refers to the product pivoted information. This Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. Managing the flux of incoming updates, keeping timestamp consistent data views to entities & their associations and serving it to clients are the main challenges. This talk tries to take us through the journey of scaling platform to serve… more
Section: Full talk
Technical level: Intermediate
|
Reasoning: The Next Frontier in Data ScienceThe “Prediction Paradigm” in data science has come a long way. Today, we can build reasonably accurate models for complex prediction problems such as detecting objects in Images, answering Jeopardy questions, translating documents from one language to another, or recognising people from face images. more
Section: Full talk
Technical level: Intermediate
|
Using Data to Identify the Genomic Cause of DiseaseA number of diseases, including cancer, are caused by genomic mutations. The task of identifying the causative mutation requires sequencing the genome and then analysing the large amount of data that results. What follows can often be confounding in various ways as this talk will illustrate with real examples -- infants who pass away mysteriously, siblings with misplaced organs, a little boy suff… more
Section: Full talk
Technical level: Intermediate
|