Jul 2018
23 Mon
24 Tue
25 Wed
26 Thu 07:45 AM – 06:15 PM IST
27 Fri 07:45 AM – 05:35 PM IST
28 Sat
29 Sun
Accepting submissions
Not accepting submissions
Topological Data Analysis Theory and PracticeAs we are already living in the age of big data and it is too big to ignore. Therefore it is important that we find ways to explore, summarize , and answer questions with this data. However the problem is not just that the data is big, but that it is complicated, loaded with surprising patterns, unusual structures, Often that means it is even too complicated for the standard methods to be useful … more
Section: Full talk
Technical level: Intermediate
|
Machine Learning for Financial Data ExtractionData plays a major role in taking decisions pertaining to Financial transactions (Buy/sell stocks, bonds, Mutual funds). I would briefly talk how Machine Learning is applied at FactSet to extract Financial information automatically. more
Section: Crisp talk
Technical level: Beginner
|
Robot Quotient - The Machine versus Human DebateThis is a contemporary paper which gives to the idea of measuring Intelligence in Robots and Machines. It gives a new perspective to the human thinking of machines. With Saudi Arabia granting citizenship to a female robot Sophia in 2017 the debate of whether robots will replace humans and if so in what ways? Will Humans and Robots coexist? Will Robots and Humans do different work or compete for t… more
Section: Crisp talk
Technical level: Advanced
|
Math for data scienceBy now it is evident that a solid math foundation is indispensible if one has to get into Data science in an honest-to-goodness way. Unfortunately, for many of us math was just a means to get better scores and never really a means to understand the world around us. That systemic failure (education system) causes many of us to feel a “gap” when doing / learning data science. It is high time that w… more
Section: Workshop
Technical level: Beginner
|
Deep Learning for NLP from scratchKing - Man + Woman = Queen The most famous example of word vectors paint an optimistic picture where computers can represent word into vectors which can be used to infer similarity. But can we extend it to sentences or to documents? How did word vectors come into existence? What are its utilities? more
Section: Workshop
Technical level: Intermediate
|
Complex network analysis using NetworkX - Graph Theory in PythonThe workshop will be focused on the basic usage of NetworkX in manipulation of Graphs and Networks. After that, we will use NetworkX for visualization and real world network analysis. more
Section: Workshop
Technical level: Beginner
|
Atlas: GO-JEK’s real-time geospatial visualization platformWe have billions of GPS points flowing through our data pipelines daily in real-time and drive decisions like driver allocation, surge pricing, driver incentives and more. This poses intriguing challenges in finding actionable insights from spatial data in real-time. At GoJek we built Atlas in an attempt to make it easy for teams within GO-JEK to visually explore this flood of geospatial data. Fo… more
Section: Full talk
Technical level: Intermediate
|
Big Data Forensic AnalyticsBig Data forensics is a new type of forensics, just as Big Data is a new way of solving the challenges presented by large, complex data. Thanks to the growth in data and the increased value of storing more data and analyzing it faster—Big Data solutions have become more common and more prominently positioned within organizations. As such, the value of Big Data systems has grown, often storing dat… more
Section: Full talk
Technical level: Beginner
|
Scalability truths and serverless architectures: why it is harder with stateful, data-driven systemsBuilding scalable systems is not easy. It is not as simple as deploying on a cloud and expecting it to scale alongwith the cloud’s elasticity. Many systems and solutions that claim elasticity of scale often indirectly limit their claims to stateless services. more
Section: Full talk
Technical level: Intermediate
|
Beyond Data stores & processing engines - Learnings from handling eCommerce Data in motionData is gold. It is at rest or in motion, is transient or reasonably permanent, is being written or read, is expensive or cheap to store and so on. While we are mostly concerned about Data stores and processing engines, the impact of Data in motion is usually ignored - on the data centre infrastructure, across System interactions, in analyzing User or System behavior. more
Section: Full talk
Technical level: Intermediate
|
Banker to the unbanked- story of scale leveraging Data Science, AWS, Scala, SparkWith the online data trail that customer leaves behind. PSPs are leveraging this to understand the ability and intent of these customers to repay back a loan. Who may not even have a bank account and most likely come from tier 2/3 towns in developing countries such as India. This talk is about the ML and data engineering that was put together to provide instant short term credit to millions of co… more
Section: Full talk
Technical level: Beginner
|
Business analytics on the cloud - a scalable model with R“R” is a great language for data analysis which analysts love, but inherently difficult to scale because of its single threaded nature and lack of libaries/web frameworks. This talk is about how we overcame/worked around the limitations to plug R into a scalable cloud platform. It also talks about other design considerations which makes it practical to do analytics with larger datasets on a cloud… more
Section: Crisp talk
Technical level: Intermediate
|
Hybrid Machine Learning with Azure IoT EdgeThis workshop will provide an overview of various concepts of IoT, Machine Learning and Azure IoT Edge. It will be a hands-on workshop which will involve actual machine learning model deployment on edge devices or edge simulator of your system. more
Section: Workshop
Technical level: Intermediate
|
Using Data to make data processing reliable againData Driven performance management of Big Data Infrastructure is very different from performance management of standard applications like web servers. A single cluster is submitted multiple simultaneous discrete applications where each of these applications can comprise up to hundreds of thousands of tasks of varying complexities. If these jobs are not tuned properly, then it’s easy to both blow … more
Section: Full talk
Technical level: Intermediate
|
Improve data quality using Apache Airflow and check operatorThe Data Team at Qubole collects usage and telemetry data from a million machines a month. We run many complex ETL workflows to process this data and provide reports, insights and recommendations to customers, analysts and data scientists. We use open source distribution of Apache Airflow to orchestrate our ETLs and process more than 1 terabyte of data daily. more
Section: Crisp talk
Technical level: Intermediate
|
Building microservices using kafkaMicroservices are the building blocks that power a post-cloud digital landscape. They help us in building services which are scalable and eases the deployment and development process. more
Section: Crisp talk
Technical level: Beginner
|
Incremental transform of transactional data models to analytical data models in near real timeTransactional systems are designed with data models to maximize write throughput across multiple parallel business flows. They evolve iteratively with business and need to react quickly to the changing business landscape to minimize time to market. Analytical systems, on the other hand, require data models to maximize query throughput over broad, deep and large data volumes. The need for a platfo… more
Section: Full talk
Technical level: Intermediate
|
What I learnt by running Apache Airflow @ScaleIn the world of data-driven applications, the role played by workflow management system is unparalleled. At Qubole we use Apache Airflow to orchestrate our complex and time critical big data ETL jobs . Though Airflow has helped us tremendously there are certain areas where all the major workflow systems lack in lights out operations. .Following are some open questions that keep showing up on Airf… more
Section: Crisp talk
Technical level: Intermediate
|
Qubole Sparklens: understanding the scalability limits of Spark applicationsOne of the common requests we receive from customers (at Qubole) is debugging slow spark application. Usually this process is done with trial and error, which takes time. Moreover, it doesn’t tell us where to looks for further improvements. We at Qubole are looking into making this process more self-serve. more
Section: Full talk
Technical level: Intermediate
|
Distributed Deep LearningThere are various open source frameworks like Tensorflow, CNTK, MXNET, Pytorch etc which allow data scientists to develop deep learning models. Traditionally, data scientists train models on a single machine, however as datasets and models grow, model training on a single node becomes inefficient. There are a couple of frameworks like tensorflow which support model training on multiple machines u… more
Section: Crisp talk
Technical level: Intermediate
|
Baking a cloud-native data warehouse from enterprise database leftoversdataxu® deals with collection, storage, processing, analysis, and projection of data at massive scale. more
Section: Crisp talk
Technical level: Intermediate
|
Deep portfolio: using neural networks for portfolio constructionDeep Learning is a good concept and it is slowly transforming the face of data analysis. The world of finance has not been impervious to its reach. Although finance has its own models which have in place for decades ( Black Sholes, CAPM ) new methodologies are coming up to leverage the power of AI more
Section: Full talk
Technical level: Intermediate
|
Market propensity modelling using XStream: unified self-service analytics ETL and ML platformAbout Product: XSTREAM XStream is a Unified Self-Service Analytics ETL & ML Platform Built On Top Of Apache Spark, which allows you to create scalable and fault tolerant pipelines.You can express your Big Data Spark computation logic in a much simpler and intuitive fashion and get your complex pipelines ready in minutes. XStream is also capable of running Big Data batch jobs as streaming computat… more
Section: Sponsored talk
Technical level: Advanced
|
Building a next generation speech and NLU engine: in pursuit of multi-modal experience for BixbyBixby is an intelligent, personalized voice interface for your phone. It lets you seamlessly switch between voice & type/touch, and supports more than 75 domains (eg. Camera, Gallery, Messages, WhatsApp, Youtube, Uber etc.). It was launched in July 2017 for English and is now available in more than 200 countries with about 8 million registered users. The talk focuses on challenges in deep learnin… more
Section: Crisp talk
Technical level: Intermediate
|
Building analytics application with streaming expressions in Apache SolrApache Solr, an open source search engine project, has come a long way since its inception driving applications to have near-real time data mixed with richrelevance available to users with high availability, auto-scaling and effective failover strategy on cloud infrastructure. more
Section: Full talk
Technical level: Intermediate
|
Bad Data is No Better Than No Data! - Impact of Automation in Data Stewardship Workflows in Plant Agriculture IndustryData stewardship is the management and oversight of organization’s data assets to provide high quality data that is easily accessible in a consistent manner for business and research decisions. It includes data acquisition, data standardization, data integration and data analytics. Data generated at different phases of the pipeline often end up in different databases and use colloquial vocabulary… more
Section: Crisp talk
Technical level: Intermediate
|
Building Scalable Machine Learning pipelines with Apache Prediction IOThe talk will help developers and data scientists understand how to build ML Pipelines using PredictionIO. In this talk we will cover how Apache PredictionIO (an open source Machine Learning Server built on top of a state-of-the-art open source stack) helps reduce time from writing a Proof of Concept for a ML model to production ready Model serving micro service with persistent model. We will als… more
Section: Full talk
Technical level: Intermediate
|
Smart Campaign Planning Through "Intelligent" Email Outreach Using NLGIn the current age where Data is the new Oil, it has become critical for Companies to gather customer data, analyze the relevant data points and derive key insights. It is crucial that companies figure out ways to retain customers by pro-actively predicting churn, grow the existing customer base by providing relevant promotions/offers, and acquire new customers by efficiently processing leads. more
Section: Full talk
Technical level: Intermediate
|
Scaling up our distributed query workloads using Kafka Streams + Rocks DBThe Analytics platform powers the Business iQ product @ AppDynamics (now part of Cisco). Business iQ provides for real-time and actionable correlations between application performance, user experience and business outcomes/performance. Business health baselines, anomaly detection, and alerts are all automated and immediately actionable through the use of business metrics and events. The platform … more
Section: Full talk
Technical level: Intermediate
|
Driving Customer Service Optimization using supervised stack ensemble with natural language featuresIn the age of social media, companies are conscious about the reviews that are posted online. Any act of dissatisfaction can be meted out by way of tart sentiments on these platforms. And so enterprises strive hard to give 100% positive experience, by doing all that they can to address customer grievances and queries. But like they say, there are slips between the cup and the lip – not all grieva… more
Section: Full talk
Technical level: Intermediate
|
Machine Learning using Orange - It's Fruitful and Fun!IPython/Jupyter notebook is widely used for data analysis in the data science community. This notebook style programming belongs to an imperative paradigm which is linear in nature. In the past decade, Visual programming paradigm has gained a lot of popularity as it is user-centric in nature and driven by data streams. more
Section: Workshop
Technical level: Beginner
|
Using Operations Research and Analytics to Propel the E-commerce IndustryWith advancements in technology and growing number of consumers, there is a hike in supply and demand in e-commerce industry that needs to be taken care of within a specified time. At ORMAE we develop Operations Research(OR) and Machine Learning(ML) solutions for our clients helping them optimize their operations and take complex business decisions. Mathematical Optimization has been a proven and… more
Section: Crisp talk
Technical level: Intermediate
|
Applying Lambda Architecture in Machine Learning realmIn mature information retrieval systems, predictions and scoring happen in multiple layers in cascaded fashion. In batch processing layer, update intervals are big and disperse. In the ingestion layer, it is done as and when the updates arrive,close to near real time. This layer is non user-path but still carries a reasonably wide feature set. Lastly, final scoring is done in user path using a mu… more
Section: Full talk
Technical level: Intermediate
|
Approximate Query ProcessingData Analysts are constantly exploring for various forms of data and searching for new insights to make better decisions for their businesses. Email marketing team at Walmart relies heavily on Customer Segmenter, an in-house tool, which figures out which customers are best suited for an email advertisement based on various attributes. Conducting these data analytics were very costly though, both … more
Section: Crisp talk
Technical level: Beginner
|
Segmenting 500 million users using Airflow + HiveWalmart is the largest retail company in US, with both online and offline presence. It reaches millions of users in all possible ways. Physical stores, an ecommerce website , exclusive sams club and jet.com to name a few. more
Section: Crisp talk
Technical level: Intermediate
|
Scaling write-heavy OLTP systems with strong data guarantees: learning from Flipkart’s user facing order capture systemsOrder capture and Order management systems at Flipkart have had to scale by 10X volumes to cater to growth in eCommerce and user base.In addition, these systems need to scale for bursty traffic by 1000x for flash sale business model. These systems are write heavy and need strong data guarantees (Consistency, Data-availability, Durability etc). With scale, the data stores for these systems have ou… more
Section: Full talk
Technical level: Intermediate
|
Building big data pipelines on kafka and kubernetesAt Appdynamics, we have been trying to push the limits to which we can scale the metric ingestion. Toward this goal, we have been taking logical pieces out of monolithic application and re-architecting these pieces to handle large scale. more
Section: Full talk
Technical level: Intermediate
|
Expressing complex ETL pipelines using CascadingAt Flipkart, data is one of the differentiators and is used in innumerable ways for decision making. Specifically, for generating recommendations, our data pipelines performs various ETL operations over terabytes of user activity data. more
Section: Crisp talk
Technical level: Beginner
|
Building Streaming platform using Kafka StreamsAt Walmart TB’s of data gets generated per day via interactions, transactions by our users on walmart.com and other properties(in-store, jet.com etc). As part of our Customer data strategy we strive to increase Reach, Depth, Freshness to know about more customers, more about customers, and in as real-time as possible. Towards this goal, we need to ingest data as when it is generated and process i… more
Section: Crisp talk
Technical level: Intermediate
|
Improving product discovery via relevance and ranking optimizationIn e-commerce, recommendations play a key role not only in customer satisfaction by improving discovery but also helps fulfill business objectives. In this talk, I will focus on our iterative journey starting from feature engineering, adding features incrementally and learning on them, thus moving from a rule based system to launching a machine learnt system in production. more
Section: Full talk
Technical level: Intermediate
|
User response prediction at scaleMillions of users browse Walmart.com each day with varying levels of intent. Many of them end up making a purchase in the same session and most, well, do not. Display retargeting channels, with ads over open web and your favourite social media sites, are then used to reach out to the potential customers with relevant content. The ad serving comes at a cost and optimizing these costs becomes espec… more
Section: Full talk
Technical level: Intermediate
|
Personalized Recommendations for Computational AdvertisingBuilding recommender systems for the task of computational advertising for Walmart.com has been an extraordinary journey. Particularly fascinating is the aspect of designing algorithms that cater to audiences who are at different stages of their purchase journey, or who might not have interacted with the site recently. This coupled with the scalability challenges and the interplay of factors like… more
Section: Full talk
Technical level: Intermediate
|
Serviceability under high demandAt Swiggy, our aim is to deliver orders to customers in a reasonable promised time regardless of when and where the order is placed. We are confronted with considerable challenges when faced with high (and sometimes unexpected) demand - think IPL weekend, rains, New Year’s Eve, competitor’s platform is down. more
Section: Full talk
Technical level: Intermediate
|
A Time Series Analysis of District-wise Government Spending**About District Treasuries: ** District Treasuries are the nodal offices for all financial transactions of the Government within the district, managing both payment and receipts. They also monitors the activities of various sub-treasuries which work as an extension of the Treasuries at the Tehsil/Taluka level. Each district has various Drawing & Disbursing Officers who are authorised to draw mon… more
Section: Full talk
Technical level: Beginner
|
Display prospecting using explore-exploit strategyIn display advertising domain, prospecting aims to build brand awareness and drive new users to the site. Due to absence of any prior user intent or user history, the task of product selection for a prospecting user from the huge item catalog becomes a great challenge. Traditionally, strategies like showcasing bestsellers, discounted products, or manually curated products have been used by market… more
Section: Crisp talk
Technical level: Intermediate
|
Our experiments with food recommendations @SwiggyFood is a very personal choice. We at Swiggy are obsessed about Customer Experience and want to make food discovery on the platform seamless and a delight for the consumer. So when you fire the Swiggy app, We take your Implicit/explicit feedback to figure out Your Taste Preferences, Your Price Affinity, Single/Group Order, Breakfast/ Late night Cravings and provide a convenient, Simple but highly… more
Section: Crisp talk
Technical level: Intermediate
|
DevOps for Data Science: Experiences from building a cloud-based data science platformProductionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse. more
Section: Full talk
Technical level: Intermediate
|
Managing Machine Learning Models in ProductionDeploying machine models in production is not a trivial task. more
Section: Crisp talk
Technical level: Intermediate
|
An Introduction to Interactive Data Visualization with BokehData Visualization is an essential step for developing data driven solution. With proper visualization, we get direct insights that lead us towards further stages of model development. While performing visualization in python, we have libraries like Matplotlib, seaborn for our help. But they come with certain limitations. Recently developed libraries with interactive plotting options, are taking … more
Section: Crisp talk
Technical level: Beginner
|
Needle in a haystack : entity search on text and graphWeb search today is moving towards displaying “answers” rather than making the user browse through pages to find what they want. “Entity” search queries, where the expected answer is a list or a set of objects; form more than 40% of today’s Web search. Yet the current approaches for answering such queries are quite brittle. We improve the state-of-the-art by infusing the semantic information of e… more
Section: Full talk
Technical level: Beginner
|
Using structural estimation methods from economics to model user behaviour in bike-sharing systemsThe cities of Paris, London, Chicago, and New York (among many others) have set up largescale bike-share systems to facilitate the use of bicycles for urban commuting. This talk estimates the impact on bike-share ridership of two facets of system performance: accessibility (how far the user must walk to reach stations) and bike-availability (the likelihood of finding a bicycle). My analysis is ba… more
Section: Full talk
Technical level: Intermediate
|
A study in classificationLet me ask you a question, is a watch a time-keeping device, an electrical gadget, a collectible item or piece of jewelry? (you can pick only one). Such queries, mandated by governments across the world, cause sleepless nights for the global trade industry. The astronomical penalties on making classification errors in such import/export declarations being one key reason for worry. more
Section: Crisp talk
Technical level: Intermediate
|
Operating data pipeline using Airflow @ SlackSlack is a communication and collaboration platform for teams. Our millions of users spend 10+ hrs connected to the service on a typical working day. more
Section: Full talk
Technical level: Advanced
|
Deep learning based hybrid recommendation systems in TensorFlowThe traditional collaborative filtering based approaches have certain lacunae like their inability to handle sparse data, cold-start and lack-of scalability when there are millions of items and/or users. The content based recommendation engines overcome cold start, but have issues in taking user feedback into account. Hybrid recommendation engines try to get the best of both worldds. We outline t… more
Section: Workshop
Technical level: Intermediate
|
So you think you know about linear regression ...Everyone has used linear regression. It’s boring, standard mathematics that we learned in Stats 101. more
Section: Full talk
Technical level: Beginner
|
Michelangelo: Uber's machine learning platformUber Engineering is committed to developing technologies that create seamless, impactful experiences for our customers. We are increasingly investing in Machine Learning to fulfill this vision. At Uber, our contribution to this space is Michelangelo, an internal ML-as-a-service platform that democratizes machine learning and makes scaling AI to meet the needs of the business as easy as requesting… more
Section: Full talk
Technical level: Intermediate
|
Seeing through the eyes of a self-driving car: visualizing autonomous vehicle data on the webThe ATG (Advanced Technologies Group) at Uber is shaping the future of driverless transportation. Over the last two years, the ATG Visualization team built a web visualization platform that enables engineers and operators across ATG to quickly inspect, debug, and explore information collected from offline and online testing. In this talk, we dive into the challenges of combining complex and diver… more
Section: Full talk
Technical level: Intermediate
|
NLP on भारतीय भाषाओंWith millions of Indian users coming online recently with the penetraion of internet, It becomes crucial to address these users with Indian/Local languages support. Most of the users are not comfortable with english and are more comfortable in hindi or some south indian languages. With the current technology, there are ways to address things like intent classification and entity extraction with e… more
Section: Crisp talk
Technical level: Intermediate
|
Weaponizing data for politicsWe’ve all heard of the prevalence of Data Analytics in the political realm and stories of how companies like Cambridge Analytica influenced elections with the use of data. It used to be information that was power in politics but now data and the analysis of it lets parties weild even more power. If done right and combined with on ground intelligence, it allows for microtargetting and targetted ad… more
Section: Full talk
Technical level: Beginner
|
The battle for privacy: right to be forgotten in IndiaAlthough the Internet is viewed as a global public resource, its functioning and access to information remains predominantly controlled by private actors. The so-called right to be forgotten, as created by the European Court of Justice’s interpretation seeks to create obligations for intermediaries to remove links to content that is lawful and available in the public domain. This talk tracks the … more
Section: Full talk
Technical level: Beginner
|
DIY - Data is YoursDIY platform for real time data aggregation Creating a job with a simple SQL like query and few clicks more
Technical level: Intermediate
|
Data science for business: adopting analytics without paralysisA bunch of factors has led companies to become data rich as compared to companies from the past. But having data alone is not good enough. This talk will explore what companies need to do to cross the rubicon & make the magic happen. Through case studies we will explore how companies can work to get their management to think more analytically & how they can create a culture where data scientists … more
Section: Full talk
Technical level: Beginner
|
The right to privacy versus the people's right to know: challenges and the way forwardNearly a year back, a nine judge bench of Supreme Court unanimously affirmed that the “Right to Privacy” is a fundamental right under the Indian Constitution. This was not the first time SC upheld right to privacy as it has been doing this in a number of decisions since Maneka Gandhi vs UoI (1978). The SC has repeatedly upheld in last four decades that individuals have autonomy over personal choi… more
Section: Full talk
Technical level: Beginner
|
GDPR- The wave of Data PrivacyGDPR is a wave of regulation that has hit Europe. It is a breakthrough regulation that sets the trend for unified data privacy norms across Europe with far reaching impacts. In this talk, Aina will take us through the history of Data privacy regulation, the latest trends, and what it means for companies and individuals globally. The talk will be of immense interest for anyone doing, or looking to… more
Technical level: Intermediate
|
Compromising a $6B big data project through poor data quality: the Aadhaar case studyThe Aadhaar project holds at least 3 PB of data and possibly more. It’s promise of providing a unique multi-modal biometric backed Identity to everyone in India has hinged on the quality of biometric templates obtained during enrollment and also the veracity and trustworthiness of the identity documents. The scale needed for the project can only be achieved through enrollment centers that are spr… more
Section: Full talk
Technical level: Beginner
|
The power of intuition in data science, and why it will always have a roleData science, fueled by big and growing datasets, has enabled the rapid discovery of new relationships and predictability in the world. If the algorithm can find the relationships backed by mountains of historical data, why the role of intuition? This seems counter to the purpose and modus operandi of data science. This talk will explain why intuition remains vital to Data science: 1) What it is;… more
Section: Full talk
Technical level: Beginner
|
Design for DataWhen evaluating the quality and likelihood of success of AI/ML projects, I have found it helpful to think in terms of three core components: Workflow, Data, and Algorithms. In media and public discussion algorithms tend to receive the most attention, and for young data scientists they are often what seem most exciting. This talk will focus on the two underrated other components: workflow and data… more
Section: Full talk
Technical level: Beginner
|