Submissions
The Fifth Elephant 2018

The Fifth Elephant 2018

The seventh edition of India's best data conference

##About the conference and topics for submitting talks:
The Fifth Elephant is rated as India’s best data conference. It is a conference for practitioners, by practitioners. In 2018, The Fifth Elephant will complete its seventh edition.

The Fifth Elephant is an evolving community of stakeholders invested in data in India. Our goal is to strengthen and grow this community by presenting talks, panels and Off The Record (OTR) sessions that present real insights about:

**

  1. Data engineering and architecture: tools, frameworks, infrastructure, architecture, case studies and scaling.
  2. Data science and machine learning: fundamentals, algorithms, streaming, tools, domain specific and data specific examples, case studies.
  3. The journey and challenges in building data driven products: design, data insights, visualisation, culture, security, governance and case studies.
  4. Talks around an emerging domain: such as IoT, finance, e-commerce, payments or data in government.
    **

##Target audience:
You should attend and speak at The Fifth Elephant if your work involves:

  1. Engineering and architecting data pipelines.
  2. Building ML models, pipelines and architectures.
  3. ML engineering.
  4. Analyzing data to build features for existing products.
  5. Using data to predict outcomes.
  6. Using data to create / model visualizations.
  7. Building products with data – either as product managers or as decision scientists.
  8. Researching concepts and deciding on algorithms for analyzing datasets.
  9. Mining data with greater speed and efficiency.
  10. Developer evangelists from organizations which want developers to use their APIs and technologies for machine learning, full stack engineering, and data science.

##Perks for submitting proposals:
Submitting a proposal, especially with our process, is hard work. We appreciate your effort.
We offer one conference ticket at discounted price to each proposer, and a t-shirt.
We only accept one speaker per talk. This is non-negotiable. Workshops may have more than one instructor.
In case of proposals where more than one person has been mentioned as collaborator, we offer the discounted ticket and t-shirt only to the person with who the editorial team corresponded directly during the evaluation process.

##Format:
The Fifth Elephant is a two-day conference with two tracks on each day. Track details will be announced with a draft schedule in February 2018.

We are accepting sessions with the following formats:

  1. Full talks of 40 minutes.
  2. Crisp talks of 20 minutes.
  3. Off the Record (OTR) sessions on focussed topics / questions. An OTR is 60-90 minutes long and typically has up to four facilitators and one moderator.
  4. Workshops and tutorials of 3-6 hours duration on Machine Learning concepts and tools, full stack data engineering, and data science concepts and tools.
  5. Pre-events. Birds Of Feather (BOF) sessions, talks, and workshops for open houses and pre-events in Bangalore and other cities between October 2017 and June 2018.** Reach out to info@hasgeek.com should you be interested in speaking and/or hosting a community event between now and the conference in July 2018.

##Selection criteria:
The first filter for a proposal is whether the technology or solution you are referring to is open source or not. The following criteria apply for closed source talks:

  1. If the technology or solution is proprietary, and you want to speak about your proprietary solution to make a pitch to the audience, you should pick up a sponsored session. This involves paying for the speaking slot. Write to fifthelephant.editorial@hasgeek.com
  2. If the technology or solution is in the process of being open sourced, we will consider the talk only if the solution is open sourced at least three months before the conference.
  3. If your solution is closed source, you should consider proposing a talk explaining why you built it in the first place; what options did you consider (business-wise and technology-wise) before making the decision to develop the solution; or, what is your specific use case that left you without existing options and necessitated creating the in-house solution.

The criteria for selecting proposals, in the order of importance, are:

  1. Key insight or takeaway: what can you share with participants that will help them in their work and in thinking about the ML, big data and data science problem space?
  2. Structure of the talk and flow of content: a detailed outline – either as mindmap or draft slides or textual description – will help us understand the focus of the talk, and the clarity of your thought process.
  3. Ability to communicate succinctly, and how you engage with the audience. You must submit link to a two-minute preview video explaining what your talk is about, and what is the key takeaway for the audience.

No one submits the perfect proposal in the first instance. We therefore encourage you to:

  1. Submit your proposal early so that we have more time to iterate if the proposal has potential.
  2. Talk to us on our community Slack channel: https://friends.hasgeek.com if you want to discuss an idea for your proposal, and need help / advice on how to structure it. Head over to the link to request an invite and join #fifthel.

Our editorial team helps potential speakers in honing their speaking skills, fine tuning and rehearsing content at least twice - before the main conference - and sharpening the focus of talks.

##How to submit a proposal (and increase your chances of getting selected):
The following guidelines will help you in submitting a proposal:

  1. Focus on why, not how. Explain to participants why you made a business or engineering decision, or why you chose a particular approach to solving your problem.
  2. The journey is more important than the solution you may want to explain. We are interested in the journey, not the outcome alone. Share as much detail as possible about how you solved the problem. Glossing over details does not help participants grasp real insights.
  3. Focus on what participants from other domains can learn/abstract from your journey / solution. Refer to these talks from The Fifth Elephant 2017, which participants liked most: http://hsgk.in/2uvYKI9 and http://hsgk.in/2ufhbWb
  4. We do not accept how-to talks unless they demonstrate latest technology. If you are demonstrating new tech, show enough to motivate participants to explore the technology later. Refer to talks such as this: http://hsgk.in/2vDpag4 and http://hsgk.in/2varOqt to structure your proposal.
  5. Similarly, we don’t accept talks on topics that have already been covered in the previous editions. If you are unsure about whether your proposal falls in this category, drop an email to: fifthelephant.editorial@hasgeek.com
  6. Content that can be read off the internet does not interest us. Our participants are keen to listen to use cases and experience stories that will help them in their practice.

To summarize, we do not accept talks that gloss over details or try to deliver high-level knowledge without covering depth. Talks have to be backed with real insights and experiences for the content to be useful to participants.

##Passes and honorarium for speakers:
We pay an honorarium of Rs. 3,000 to each speaker and workshop instructor at the end of their talk/workshop. Confirmed speakers and instructors also get a pass to the conference and networking dinner. We do not provide free passes for speakers’ colleagues and spouses.

##Travel grants for outstation speakers:
Travel grants are available for international and domestic speakers. We evaluate each case on its merits, giving preference to women, people of non-binary gender, and Africans. If you require a grant, request it when you submit your proposal in the field where you add your location. The Fifth Elephant is funded through ticket purchases and sponsorships; travel grant budgets vary.

##Last date for submitting proposals is: 31 March 2018.
You must submit the following details along with your proposal, or within 10 days of submission:

  1. Draft slides, mind map or a textual description detailing the structure and content of your talk.
  2. Link to a self-recorded, two-minute preview video, where you explain what your talk is about, and the key takeaways for participants. This preview video helps conference editors understand the lucidity of your thoughts and how invested you are in presenting insights beyond the solution you have built, or your use case. Please note that the preview video should be submitted irrespective of whether you have spoken at past editions of The Fifth Elephant.
  3. If you submit a workshop proposal, you must specify the target audience for your workshop; duration; number of participants you can accommodate; pre-requisites for the workshop; link to GitHub repositories and a document showing the full workshop plan.

##Contact details:
For more information about the conference, sponsorships, or any other information contact support@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Accepting submissions

Not accepting submissions

Milan Joshi

Topological Data Analysis Theory and Practice

As we are already living in the age of big data and it is too big to ignore. Therefore it is important that we find ways to explore, summarize , and answer questions with this data. However the problem is not just that the data is big, but that it is complicated, loaded with surprising patterns, unusual structures, Often that means it is even too complicated for the standard methods to be useful … more
  • 16 comments
  • Rejected
  • 17 Aug 2017
Section: Full talk Technical level: Intermediate

Karthik Gali

Video thumbnail

Machine Learning for Financial Data Extraction

Data plays a major role in taking decisions pertaining to Financial transactions (Buy/sell stocks, bonds, Mutual funds). I would briefly talk how Machine Learning is applied at FactSet to extract Financial information automatically. more
  • 2 comments
  • Rejected
  • 13 Dec 2017
Section: Crisp talk Technical level: Beginner

Puneet Mathur

Robot Quotient - The Machine versus Human Debate

This is a contemporary paper which gives to the idea of measuring Intelligence in Robots and Machines. It gives a new perspective to the human thinking of machines. With Saudi Arabia granting citizenship to a female robot Sophia in 2017 the debate of whether robots will replace humans and if so in what ways? Will Humans and Robots coexist? Will Robots and Humans do different work or compete for t… more
  • 19 comments
  • Rejected
  • 26 Jan 2018
Section: Crisp talk Technical level: Advanced

Vishal

Math for data science

By now it is evident that a solid math foundation is indispensible if one has to get into Data science in an honest-to-goodness way. Unfortunately, for many of us math was just a means to get better scores and never really a means to understand the world around us. That systemic failure (education system) causes many of us to feel a “gap” when doing / learning data science. It is high time that w… more
  • 0 comments
  • Confirmed
  • 30 Jan 2018
Section: Workshop Technical level: Beginner

Nishant Nikhil

Deep Learning for NLP from scratch

King - Man + Woman = Queen The most famous example of word vectors paint an optimistic picture where computers can represent word into vectors which can be used to infer similarity. But can we extend it to sentences or to documents? How did word vectors come into existence? What are its utilities? more
  • 0 comments
  • Under evaluation
  • 01 Feb 2018
Section: Workshop Technical level: Intermediate

Himanshu Mishra

Complex network analysis using NetworkX - Graph Theory in Python

The workshop will be focused on the basic usage of NetworkX in manipulation of Graphs and Networks. After that, we will use NetworkX for visualization and real world network analysis. more
  • 1 comment
  • Under evaluation
  • 01 Feb 2018
Section: Workshop Technical level: Beginner

Ravi Suhag

Atlas: GO-JEK’s real-time geospatial visualization platform

We have billions of GPS points flowing through our data pipelines daily in real-time and drive decisions like driver allocation, surge pricing, driver incentives and more. This poses intriguing challenges in finding actionable insights from spatial data in real-time. At GoJek we built Atlas in an attempt to make it easy for teams within GO-JEK to visually explore this flood of geospatial data. Fo… more
  • 3 comments
  • Confirmed & scheduled
  • 20 Feb 2018
Section: Full talk Technical level: Intermediate

Deepak Mane

Big Data Forensic Analytics

Big Data forensics is a new type of forensics, just as Big Data is a new way of solving the challenges presented by large, complex data. Thanks to the growth in data and the increased value of storing more data and analyzing it faster—Big Data solutions have become more common and more prominently positioned within organizations. As such, the value of Big Data systems has grown, often storing dat… more
  • 3 comments
  • Rejected
  • 04 Mar 2018
Section: Full talk Technical level: Beginner

Regunath B

Scalability truths and serverless architectures: why it is harder with stateful, data-driven systems

Building scalable systems is not easy. It is not as simple as deploying on a cloud and expecting it to scale alongwith the cloud’s elasticity. Many systems and solutions that claim elasticity of scale often indirectly limit their claims to stateless services. more
  • 0 comments
  • Confirmed & scheduled
  • 14 Mar 2018
Section: Full talk Technical level: Intermediate

Regunath B

Beyond Data stores & processing engines - Learnings from handling eCommerce Data in motion

Data is gold. It is at rest or in motion, is transient or reasonably permanent, is being written or read, is expensive or cheap to store and so on. While we are mostly concerned about Data stores and processing engines, the impact of Data in motion is usually ignored - on the data centre infrastructure, across System interactions, in analyzing User or System behavior. more
  • 0 comments
  • Rejected
  • 15 Mar 2018
Section: Full talk Technical level: Intermediate

Arpit Gupta

Video thumbnail

Banker to the unbanked- story of scale leveraging Data Science, AWS, Scala, Spark

With the online data trail that customer leaves behind. PSPs are leveraging this to understand the ability and intent of these customers to repay back a loan. Who may not even have a bank account and most likely come from tier 2/3 towns in developing countries such as India. This talk is about the ML and data engineering that was put together to provide instant short term credit to millions of co… more
  • 3 comments
  • Rejected
  • 16 Mar 2018
Section: Full talk Technical level: Beginner

Praveen Chandrasekharan

Business analytics on the cloud - a scalable model with R

“R” is a great language for data analysis which analysts love, but inherently difficult to scale because of its single threaded nature and lack of libaries/web frameworks. This talk is about how we overcame/worked around the limitations to plug R into a scalable cloud platform. It also talks about other design considerations which makes it practical to do analytics with larger datasets on a cloud… more
  • 4 comments
  • Rejected
  • 18 Mar 2018
Section: Crisp talk Technical level: Intermediate

SUNIL KUMAR

Hybrid Machine Learning with Azure IoT Edge

This workshop will provide an overview of various concepts of IoT, Machine Learning and Azure IoT Edge. It will be a hands-on workshop which will involve actual machine learning model deployment on edge devices or edge simulator of your system. more
  • 2 comments
  • Rejected
  • 20 Mar 2018
Section: Workshop Technical level: Intermediate

devjyoti

Using Data to make data processing reliable again

Data Driven performance management of Big Data Infrastructure is very different from performance management of standard applications like web servers. A single cluster is submitted multiple simultaneous discrete applications where each of these applications can comprise up to hundreds of thousands of tasks of varying complexities. If these jobs are not tuned properly, then it’s easy to both blow … more
  • 2 comments
  • Rejected
  • 21 Mar 2018
Section: Full talk Technical level: Intermediate

Sakshi Bansal

Improve data quality using Apache Airflow and check operator

The Data Team at Qubole collects usage and telemetry data from a million machines a month. We run many complex ETL workflows to process this data and provide reports, insights and recommendations to customers, analysts and data scientists. We use open source distribution of Apache Airflow to orchestrate our ETLs and process more than 1 terabyte of data daily. more
  • 4 comments
  • Confirmed & scheduled
  • 21 Mar 2018
Section: Crisp talk Technical level: Intermediate

anugrah nayar

Video thumbnail

Building microservices using kafka

Microservices are the building blocks that power a post-cloud digital landscape. They help us in building services which are scalable and eases the deployment and development process. more
  • 2 comments
  • Rejected
  • 23 Mar 2018
Section: Crisp talk Technical level: Beginner

Govind Pandey

Video thumbnail

Incremental transform of transactional data models to analytical data models in near real time

Transactional systems are designed with data models to maximize write throughput across multiple parallel business flows. They evolve iteratively with business and need to react quickly to the changing business landscape to minimize time to market. Analytical systems, on the other hand, require data models to maximize query throughput over broad, deep and large data volumes. The need for a platfo… more
  • 2 comments
  • Confirmed & scheduled
  • 26 Mar 2018
Section: Full talk Technical level: Intermediate

Sreenath S Kamath

What I learnt by running Apache Airflow @Scale

In the world of data-driven applications, the role played by workflow management system is unparalleled. At Qubole we use Apache Airflow to orchestrate our complex and time critical big data ETL jobs . Though Airflow has helped us tremendously there are certain areas where all the major workflow systems lack in lights out operations. .Following are some open questions that keep showing up on Airf… more
  • 1 comment
  • Awaiting details
  • 26 Mar 2018
Section: Crisp talk Technical level: Intermediate

Rohit Karlupia

Video thumbnail

Qubole Sparklens: understanding the scalability limits of Spark applications

One of the common requests we receive from customers (at Qubole) is debugging slow spark application. Usually this process is done with trial and error, which takes time. Moreover, it doesn’t tell us where to looks for further improvements. We at Qubole are looking into making this process more self-serve. more
  • 5 comments
  • Confirmed & scheduled
  • 26 Mar 2018
Section: Full talk Technical level: Intermediate

Somya Kumar

Distributed Deep Learning

There are various open source frameworks like Tensorflow, CNTK, MXNET, Pytorch etc which allow data scientists to develop deep learning models. Traditionally, data scientists train models on a single machine, however as datasets and models grow, model training on a single node becomes inefficient. There are a couple of frameworks like tensorflow which support model training on multiple machines u… more
  • 4 comments
  • Rejected
  • 26 Mar 2018
Section: Crisp talk Technical level: Intermediate

Vineeti Louis

Baking a cloud-native data warehouse from enterprise database leftovers

dataxu® deals with collection, storage, processing, analysis, and projection of data at massive scale. more
  • 0 comments
  • Rejected
  • 27 Mar 2018
Section: Crisp talk Technical level: Intermediate

Anant Gupta

Deep portfolio: using neural networks for portfolio construction

Deep Learning is a good concept and it is slowly transforming the face of data analysis. The world of finance has not been impervious to its reach. Although finance has its own models which have in place for decades ( Black Sholes, CAPM ) new methodologies are coming up to leverage the power of AI more
  • 9 comments
  • Confirmed & scheduled
  • 27 Mar 2018
Section: Full talk Technical level: Intermediate

Puneet

Market propensity modelling using XStream: unified self-service analytics ETL and ML platform

About Product: XSTREAM XStream is a Unified Self-Service Analytics ETL & ML Platform Built On Top Of Apache Spark, which allows you to create scalable and fault tolerant pipelines.You can express your Big Data Spark computation logic in a much simpler and intuitive fashion and get your complex pipelines ready in minutes. XStream is also capable of running Big Data batch jobs as streaming computat… more
  • 0 comments
  • Confirmed & scheduled
  • 27 Mar 2018
Section: Sponsored talk Technical level: Advanced

Vikram Vij

Video thumbnail

Building a next generation speech and NLU engine: in pursuit of multi-modal experience for Bixby

Bixby is an intelligent, personalized voice interface for your phone. It lets you seamlessly switch between voice & type/touch, and supports more than 75 domains (eg. Camera, Gallery, Messages, WhatsApp, Youtube, Uber etc.). It was launched in July 2017 for English and is now available in more than 200 countries with about 8 million registered users. The talk focuses on challenges in deep learnin… more
  • 8 comments
  • Confirmed & scheduled
  • 28 Mar 2018
Section: Crisp talk Technical level: Intermediate

Amrit Sarkar

Building analytics application with streaming expressions in Apache Solr

Apache Solr, an open source search engine project, has come a long way since its inception driving applications to have near-real time data mixed with richrelevance available to users with high availability, auto-scaling and effective failover strategy on cloud infrastructure. more
  • 2 comments
  • Confirmed & scheduled
  • 29 Mar 2018
Section: Full talk Technical level: Intermediate

Karnam Vasudeva Rao

Video thumbnail

Bad Data is No Better Than No Data! - Impact of Automation in Data Stewardship Workflows in Plant Agriculture Industry

Data stewardship is the management and oversight of organization’s data assets to provide high quality data that is easily accessible in a consistent manner for business and research decisions. It includes data acquisition, data standardization, data integration and data analytics. Data generated at different phases of the pipeline often end up in different databases and use colloquial vocabulary… more
  • 3 comments
  • Rejected
  • 30 Mar 2018
Section: Crisp talk Technical level: Intermediate

Rajdeep Dua

Building Scalable Machine Learning pipelines with Apache Prediction IO

The talk will help developers and data scientists understand how to build ML Pipelines using PredictionIO. In this talk we will cover how Apache PredictionIO (an open source Machine Learning Server built on top of a state-of-the-art open source stack) helps reduce time from writing a Proof of Concept for a ML model to production ready Model serving micro service with persistent model. We will als… more
  • 1 comment
  • Rejected
  • 30 Mar 2018
Section: Full talk Technical level: Intermediate

Tredence Inc Proposing

Video thumbnail

Smart Campaign Planning Through "Intelligent" Email Outreach Using NLG

In the current age where Data is the new Oil, it has become critical for Companies to gather customer data, analyze the relevant data points and derive key insights. It is crucial that companies figure out ways to retain customers by pro-actively predicting churn, grow the existing customer base by providing relevant promotions/offers, and acquire new customers by efficiently processing leads. more
  • 5 comments
  • Rejected
  • 30 Mar 2018
Section: Full talk Technical level: Intermediate

Ronakkumar Kothari

Video thumbnail

Scaling up our distributed query workloads using Kafka Streams + Rocks DB

The Analytics platform powers the Business iQ product @ AppDynamics (now part of Cisco). Business iQ provides for real-time and actionable correlations between application performance, user experience and business outcomes/performance. Business health baselines, anomaly detection, and alerts are all automated and immediately actionable through the use of business metrics and events. The platform … more
  • 6 comments
  • Rejected
  • 30 Mar 2018
Section: Full talk Technical level: Intermediate

Tredence Inc Proposing

Video thumbnail

Driving Customer Service Optimization using supervised stack ensemble with natural language features

In the age of social media, companies are conscious about the reviews that are posted online. Any act of dissatisfaction can be meted out by way of tart sentiments on these platforms. And so enterprises strive hard to give 100% positive experience, by doing all that they can to address customer grievances and queries. But like they say, there are slips between the cup and the lip – not all grieva… more
  • 4 comments
  • Rejected
  • 30 Mar 2018
Section: Full talk Technical level: Intermediate

Ankit Mahato

Video thumbnail

Machine Learning using Orange - It's Fruitful and Fun!

IPython/Jupyter notebook is widely used for data analysis in the data science community. This notebook style programming belongs to an imperative paradigm which is linear in nature. In the past decade, Visual programming paradigm has gained a lot of popularity as it is user-centric in nature and driven by data streams. more
  • 0 comments
  • Rejected
  • 30 Mar 2018
Section: Workshop Technical level: Beginner

Dr Amit Garg

Video thumbnail

Using Operations Research and Analytics to Propel the E-commerce Industry

With advancements in technology and growing number of consumers, there is a hike in supply and demand in e-commerce industry that needs to be taken care of within a specified time. At ORMAE we develop Operations Research(OR) and Machine Learning(ML) solutions for our clients helping them optimize their operations and take complex business decisions. Mathematical Optimization has been a proven and… more
  • 2 comments
  • Rejected
  • 31 Mar 2018
Section: Crisp talk Technical level: Intermediate

Akash Khandelwal

Video thumbnail

Applying Lambda Architecture in Machine Learning realm

In mature information retrieval systems, predictions and scoring happen in multiple layers in cascaded fashion. In batch processing layer, update intervals are big and disperse. In the ingestion layer, it is done as and when the updates arrive,close to near real time. This layer is non user-path but still carries a reasonably wide feature set. Lastly, final scoring is done in user path using a mu… more
  • 3 comments
  • Rejected
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

DEEPAK GOYAL

Video thumbnail

Approximate Query Processing

Data Analysts are constantly exploring for various forms of data and searching for new insights to make better decisions for their businesses. Email marketing team at Walmart relies heavily on Customer Segmenter, an in-house tool, which figures out which customers are best suited for an email advertisement based on various attributes. Conducting these data analytics were very costly though, both … more
  • 1 comment
  • Rejected
  • 31 Mar 2018
Section: Crisp talk Technical level: Beginner

Soumya Shukla

Video thumbnail

Segmenting 500 million users using Airflow + Hive

Walmart is the largest retail company in US, with both online and offline presence. It reaches millions of users in all possible ways. Physical stores, an ecommerce website , exclusive sams club and jet.com to name a few. more
  • 5 comments
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Crisp talk Technical level: Intermediate

Gokulvanan V Velan (Customer Platform - VS)

Scaling write-heavy OLTP systems with strong data guarantees: learning from Flipkart’s user facing order capture systems

Order capture and Order management systems at Flipkart have had to scale by 10X volumes to cater to growth in eCommerce and user base.In addition, these systems need to scale for bursty traffic by 1000x for flash sale business model. These systems are write heavy and need strong data guarantees (Consistency, Data-availability, Durability etc). With scale, the data stores for these systems have ou… more
  • 6 comments
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Abhishek Agarwal

Video thumbnail

Building big data pipelines on kafka and kubernetes

At Appdynamics, we have been trying to push the limits to which we can scale the metric ingestion. Toward this goal, we have been taking logical pieces out of monolithic application and re-architecting these pieces to handle large scale. more
  • 3 comments
  • Rejected
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Neha Kumari

Expressing complex ETL pipelines using Cascading

At Flipkart, data is one of the differentiators and is used in innumerable ways for decision making. Specifically, for generating recommendations, our data pipelines performs various ETL operations over terabytes of user activity data. more
  • 0 comments
  • Rejected
  • 31 Mar 2018
Section: Crisp talk Technical level: Beginner

ADDEPALLI GIRIDHAR

Video thumbnail

Building Streaming platform using Kafka Streams

At Walmart TB’s of data gets generated per day via interactions, transactions by our users on walmart.com and other properties(in-store, jet.com etc). As part of our Customer data strategy we strive to increase Reach, Depth, Freshness to know about more customers, more about customers, and in as real-time as possible. Towards this goal, we need to ingest data as when it is generated and process i… more
  • 1 comment
  • Rejected
  • 31 Mar 2018
Section: Crisp talk Technical level: Intermediate

Akash Khandelwal

Video thumbnail

Improving product discovery via relevance and ranking optimization

In e-commerce, recommendations play a key role not only in customer satisfaction by improving discovery but also helps fulfill business objectives. In this talk, I will focus on our iterative journey starting from feature engineering, adding features incrementally and learning on them, thus moving from a rule based system to launching a machine learnt system in production. more
  • 2 comments
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Priyanka Bhatt

User response prediction at scale

Millions of users browse Walmart.com each day with varying levels of intent. Many of them end up making a purchase in the same session and most, well, do not. Display retargeting channels, with ads over open web and your favourite social media sites, are then used to reach out to the potential customers with relevant content. The ad serving comes at a cost and optimizing these costs becomes espec… more
  • 3 comments
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Surabhi Punjabi

Personalized Recommendations for Computational Advertising

Building recommender systems for the task of computational advertising for Walmart.com has been an extraordinary journey. Particularly fascinating is the aspect of designing algorithms that cater to audiences who are at different stages of their purchase journey, or who might not have interacted with the site recently. This coupled with the scalability challenges and the interplay of factors like… more
  • 0 comments
  • Rejected
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Venkateshan K

Serviceability under high demand

At Swiggy, our aim is to deliver orders to customers in a reasonable promised time regardless of when and where the order is placed. We are confronted with considerable challenges when faced with high (and sometimes unexpected) demand - think IPL weekend, rains, New Year’s Eve, competitor’s platform is down. more
  • 1 comment
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Full talk Technical level: Intermediate

Gaurav Godhwani

A Time Series Analysis of District-wise Government Spending

**About District Treasuries: ** District Treasuries are the nodal offices for all financial transactions of the Government within the district, managing both payment and receipts. They also monitors the activities of various sub-treasuries which work as an extension of the Treasuries at the Tehsil/Taluka level. Each district has various Drawing & Disbursing Officers who are authorised to draw mon… more
  • 2 comments
  • Rejected
  • 31 Mar 2018
Section: Full talk Technical level: Beginner

Akshita Sukhlecha

Display prospecting using explore-exploit strategy

In display advertising domain, prospecting aims to build brand awareness and drive new users to the site. Due to absence of any prior user intent or user history, the task of product selection for a prospecting user from the huge item catalog becomes a great challenge. Traditionally, strategies like showcasing bestsellers, discounted products, or manually curated products have been used by market… more
  • 1 comment
  • Rejected
  • 31 Mar 2018
Section: Crisp talk Technical level: Intermediate

nitin hardeniya

Our experiments with food recommendations @Swiggy

Food is a very personal choice. We at Swiggy are obsessed about Customer Experience and want to make food discovery on the platform seamless and a delight for the consumer. So when you fire the Swiggy app, We take your Implicit/explicit feedback to figure out Your Taste Preferences, Your Price Affinity, Single/Group Order, Breakfast/ Late night Cravings and provide a convenient, Simple but highly… more
  • 4 comments
  • Confirmed & scheduled
  • 31 Mar 2018
Section: Crisp talk Technical level: Intermediate

Anand Chitipothu

DevOps for Data Science: Experiences from building a cloud-based data science platform

Productionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse. more
  • 3 comments
  • Rejected
  • 01 Apr 2018
Section: Full talk Technical level: Intermediate

Anand Chitipothu

Managing Machine Learning Models in Production

Deploying machine models in production is not a trivial task. more
  • 3 comments
  • Rejected
  • 01 Apr 2018
Section: Crisp talk Technical level: Intermediate

Uddipta Bhattacharjee

An Introduction to Interactive Data Visualization with Bokeh

Data Visualization is an essential step for developing data driven solution. With proper visualization, we get direct insights that lead us towards further stages of model development. While performing visualization in python, we have libraries like Matplotlib, seaborn for our help. But they come with certain limitations. Recently developed libraries with interactive plotting options, are taking … more
  • 0 comments
  • Rejected
  • 01 Apr 2018
Section: Crisp talk Technical level: Beginner

Uma Sawant

Needle in a haystack : entity search on text and graph

Web search today is moving towards displaying “answers” rather than making the user browse through pages to find what they want. “Entity” search queries, where the expected answer is a list or a set of objects; form more than 40% of today’s Web search. Yet the current approaches for answering such queries are quite brittle. We improve the state-of-the-art by infusing the semantic information of e… more
  • 1 comment
  • Confirmed & scheduled
  • 01 May 2018
Section: Full talk Technical level: Beginner

Ashish Kabra

Using structural estimation methods from economics to model user behaviour in bike-sharing systems

The cities of Paris, London, Chicago, and New York (among many others) have set up largescale bike-share systems to facilitate the use of bicycles for urban commuting. This talk estimates the impact on bike-share ridership of two facets of system performance: accessibility (how far the user must walk to reach stations) and bike-availability (the likelihood of finding a bicycle). My analysis is ba… more
  • 3 comments
  • Confirmed & scheduled
  • 11 Apr 2018
Section: Full talk Technical level: Intermediate

Ramanan Balakrishnan

A study in classification

Let me ask you a question, is a watch a time-keeping device, an electrical gadget, a collectible item or piece of jewelry? (you can pick only one). Such queries, mandated by governments across the world, cause sleepless nights for the global trade industry. The astronomical penalties on making classification errors in such import/export declarations being one key reason for worry. more
  • 5 comments
  • Confirmed & scheduled
  • 29 May 2018
Section: Crisp talk Technical level: Intermediate

Ananth Packkildurai

Video thumbnail

Operating data pipeline using Airflow @ Slack

Slack is a communication and collaboration platform for teams. Our millions of users spend 10+ hrs connected to the service on a typical working day. more
  • 4 comments
  • Confirmed & scheduled
  • 09 May 2018
Section: Full talk Technical level: Advanced

Vijay Srinivas Agneeswaran, Ph.D

Deep learning based hybrid recommendation systems in TensorFlow

The traditional collaborative filtering based approaches have certain lacunae like their inability to handle sparse data, cold-start and lack-of scalability when there are millions of items and/or users. The content based recommendation engines overcome cold start, but have issues in taking user feedback into account. Hybrid recommendation engines try to get the best of both worldds. We outline t… more
  • 10 comments
  • Confirmed & scheduled
  • 25 Apr 2018
Section: Workshop Technical level: Intermediate

Chris Stucchio

So you think you know about linear regression ...

Everyone has used linear regression. It’s boring, standard mathematics that we learned in Stats 101. more
  • 1 comment
  • Confirmed & scheduled
  • 11 Jun 2018
Section: Full talk Technical level: Beginner

Achal Shah

Michelangelo: Uber's machine learning platform

Uber Engineering is committed to developing technologies that create seamless, impactful experiences for our customers. We are increasingly investing in Machine Learning to fulfill this vision. At Uber, our contribution to this space is Michelangelo, an internal ML-as-a-service platform that democratizes machine learning and makes scaling AI to meet the needs of the business as easy as requesting… more
  • 6 comments
  • Confirmed & scheduled
  • 14 Jun 2018
Section: Full talk Technical level: Intermediate

Xiaoji Chen

Seeing through the eyes of a self-driving car: visualizing autonomous vehicle data on the web

The ATG (Advanced Technologies Group) at Uber is shaping the future of driverless transportation. Over the last two years, the ATG Visualization team built a web visualization platform that enables engineers and operators across ATG to quickly inspect, debug, and explore information collected from offline and online testing. In this talk, we dive into the challenges of combining complex and diver… more
  • 0 comments
  • Confirmed & scheduled
  • 14 Jun 2018
Section: Full talk Technical level: Intermediate

Hitesh Mantrala

NLP on भारतीय भाषाओं

With millions of Indian users coming online recently with the penetraion of internet, It becomes crucial to address these users with Indian/Local languages support. Most of the users are not comfortable with english and are more comfortable in hindi or some south indian languages. With the current technology, there are ways to address things like intent classification and entity extraction with e… more
  • 0 comments
  • Rejected
  • 23 May 2018
Section: Crisp talk Technical level: Intermediate

Shivam Shankar Singh

Weaponizing data for politics

We’ve all heard of the prevalence of Data Analytics in the political realm and stories of how companies like Cambridge Analytica influenced elections with the use of data. It used to be information that was power in politics but now data and the analysis of it lets parties weild even more power. If done right and combined with on ground intelligence, it allows for microtargetting and targetted ad… more
  • 2 comments
  • Confirmed & scheduled
  • 20 Jun 2018
Section: Full talk Technical level: Beginner

Jyoti Panday

The battle for privacy: right to be forgotten in India

Although the Internet is viewed as a global public resource, its functioning and access to information remains predominantly controlled by private actors. The so-called right to be forgotten, as created by the European Court of Justice’s interpretation seeks to create obligations for intermediaries to remove links to content that is lawful and available in the public domain. This talk tracks the … more
  • 0 comments
  • Confirmed & scheduled
  • 26 Jun 2018
Section: Full talk Technical level: Beginner

Gaurav Singhania

DIY - Data is Yours

DIY platform for real time data aggregation Creating a job with a simple SQL like query and few clicks more
  • 0 comments
  • Awaiting details
  • 25 Jun 2018
Technical level: Intermediate

Ajay Kelkar

Data science for business: adopting analytics without paralysis

A bunch of factors has led companies to become data rich as compared to companies from the past. But having data alone is not good enough. This talk will explore what companies need to do to cross the rubicon & make the magic happen. Through case studies we will explore how companies can work to get their management to think more analytically & how they can create a culture where data scientists … more
  • 3 comments
  • Confirmed & scheduled
  • 27 Jun 2018
Section: Full talk Technical level: Beginner

Sushant Sinha

The right to privacy versus the people's right to know: challenges and the way forward

Nearly a year back, a nine judge bench of Supreme Court unanimously affirmed that the “Right to Privacy” is a fundamental right under the Indian Constitution. This was not the first time SC upheld right to privacy as it has been doing this in a number of decisions since Maneka Gandhi vs UoI (1978). The SC has repeatedly upheld in last four decades that individuals have autonomy over personal choi… more
  • 3 comments
  • Confirmed & scheduled
  • 27 Jun 2018
Section: Full talk Technical level: Beginner

Aina Rao (nagu rao)

GDPR- The wave of Data Privacy

GDPR is a wave of regulation that has hit Europe. It is a breakthrough regulation that sets the trend for unified data privacy norms across Europe with far reaching impacts. In this talk, Aina will take us through the history of Data privacy regulation, the latest trends, and what it means for companies and individuals globally. The talk will be of immense interest for anyone doing, or looking to… more
  • 4 comments
  • Confirmed
  • 01 Jul 2018
Technical level: Intermediate
Anand Venkatanarayanan

Anand Venkatanarayanan

Compromising a $6B big data project through poor data quality: the Aadhaar case study

The Aadhaar project holds at least 3 PB of data and possibly more. It’s promise of providing a unique multi-modal biometric backed Identity to everyone in India has hinged on the quality of biometric templates obtained during enrollment and also the veracity and trustworthiness of the identity documents. The scale needed for the project can only be achieved through enrollment centers that are spr… more
  • 3 comments
  • Confirmed & scheduled
  • 02 Jul 2018
Section: Full talk Technical level: Beginner

Avi Patchava

The power of intuition in data science, and why it will always have a role

Data science, fueled by big and growing datasets, has enabled the rapid discovery of new relationships and predictability in the world. If the algorithm can find the relationships backed by mountains of historical data, why the role of intuition? This seems counter to the purpose and modus operandi of data science. This talk will explain why intuition remains vital to Data science: 1) What it is;… more
  • 18 comments
  • Confirmed & scheduled
  • 04 Jul 2018
Section: Full talk Technical level: Beginner

Paul Meinshausen

Design for Data

When evaluating the quality and likelihood of success of AI/ML projects, I have found it helpful to think in terms of three core components: Workflow, Data, and Algorithms. In media and public discussion algorithms tend to receive the most attention, and for young data scientists they are often what seem most exciting. This talk will focus on the two underrated other components: workflow and data… more
  • 2 comments
  • Confirmed & scheduled
  • 12 Jul 2018
Section: Full talk Technical level: Beginner

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more