The Fifth Elephant winter edition 2019

Winter edition of India's most renowned conference on big data and data science

Reducing Cost of Production AI: Feature Engineering Case Study

Submitted by Venkata Pingali (@venkatapingali) on Monday, 1 October 2018

videocam_off

Technical level

Intermediate

Section

Crisp talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +1

Abstract

The number and complexity of datasets, usecases, and models are rapidly growing. However, the number of ML/AI applications in production are growing much more slowly. AI in production is suffering from a multiple challenges that vary by domain. We focus on a common activity - machine learning feature engineering involving textual data. It accounts for 40-80% of time and contributes significantly to the cost of ML applications using business data. We do not address other aspects of AI.

We identify the cost dimensions in feature engineering for business data and share ways to reduce the cost of this step.

The methods have been tested every day for more than a year. We enabled customer modeling in production across multiple deployments that covered 2M+ people, consumed 800GB of data, and computed upto 500 features for each person.

This approach draws upon the experience with building a less flexible and more expensive solution approach in 2016 using Hive and Pandas. It serves as an imperfect baseline, and our current approach is conservatively 3x improvement over the baseline.

Outline

  1. Feature Engineering Overview
  2. Typical Feature Engineering Cycle
    • Trends
  3. Detailed Cost Drivers
    • Examples: Reconciliation & auditing, change management
  4. Indicative Quantitative Improvement
  5. Detailed discussion of each driver

Requirements

Have an ML system in production or plan to have one.

Speaker bio

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, an ML Engineering company based in Bangalore and Denver. Scribble’s flagship enterprise product, Enrich, accelerates ML product development in enterprises. Before starting Scribble Data, Dr. Pingali was VP of Analytics at a political data consulting firm. He has a BTech from IIT Mumbai and a PhD from USC in Computer Science.

Links

Slides

https://drive.google.com/open?id=1aKZMWMn_t9A5RSv8wkU-zOgcs2tpIsX4

Comments

Login with Twitter or Google to leave a comment