Reducing Cost of Production AI: Feature Engineering Case Study

This submission has been added to the schedule

Reducing Cost of Production AI: Feature Engineering Case Study

Submitted Oct 1, 2018

Section: Crisp talk Technical level: Intermediate

The number and complexity of datasets, usecases, and models are rapidly growing. However, the number of ML/AI applications in production are growing much more slowly. AI in production is suffering from a multiple challenges that vary by domain. We focus on a common activity - machine learning feature engineering involving textual data. It accounts for 40-80% of time and contributes significantly to the cost of ML applications using business data. We do not address other aspects of AI.

We identify the cost dimensions in feature engineering for business data and share ways to reduce the cost of this step.

The methods have been tested every day for more than a year. We enabled customer modeling in production across multiple deployments that covered 2M+ people, consumed 800GB of data, and computed upto 500 features for each person.

This approach draws upon the experience with building a less flexible and more expensive solution approach in 2016 using Hive and Pandas. It serves as an imperfect baseline, and our current approach is conservatively 3x improvement over the baseline.

Outline

Feature Engineering Overview
Typical Feature Engineering Cycle
- Trends
Detailed Cost Drivers
- Examples: Reconciliation & auditing, change management
Indicative Quantitative Improvement
Detailed discussion of each driver

Requirements

Have an ML system in production or plan to have one.

Speaker bio

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, an ML Engineering company based in Bangalore and Denver. Scribble’s flagship enterprise product, Enrich, accelerates ML product development in enterprises. Before starting Scribble Data, Dr. Pingali was VP of Analytics at a political data consulting firm. He has a BTech from IIT Mumbai and a PhD from USC in Computer Science.

Links

Slides

https://drive.google.com/open?id=1aKZMWMn_t9A5RSv8wkU-zOgcs2tpIsX4

The Fifth Elephant winter edition 2019

Reducing Cost of Production AI: Feature Engineering Case Study

Outline

Requirements

Speaker bio

Links

Slides

Comments