Taming the Data Elephant (aka) Productionizing Data Science!

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

Data governance
Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
Data cleaning, annotation, instrumentation and productionizing data science.
Identifying and handling fraud + data security at scale
Feature engineering and ML platforms.
What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

Network with peers and practitioners from the data ecosystem.
Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
Demo your ideas in the demo sessions.
Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

Taming the Data Elephant (aka) Productionizing Data Science!

Submitted May 31, 2020

To productionize data science and get actionable insights from raw data, require organizations to efficiently build, operate, and manage complex large scale data platforms. When it comes to productionizing ML models and achieving business value, it is very important to develop models iteratively, test and deploy on top of a robust platform infrastructure.

A fully automated platform enables us to manage the life cycle of ML-models to production with greater reliability/predictability. The platform should be responsible to ensure every commit is deployable in an automated fashion, persistence of the results of the versioned models in an auditable/readable way as well as scheduling and any dependent workflows.

Training of models in production with terabytes of data is costly in terms of training time, cost, failure chances, costs of re-run etc. Incremental training is a way to mitigate some of these costs and have an efficient model in production. In this talk, I will break down how we built a tera-bytes scale extensible and programmable data platform to enable continuous data-driven insights and how we ‘tamed the beast’ to run data science at scale. I will also cover examples of incremental training about how we migrated from a model running over a large training time series dataset, to an incremental model with weekly data.

Outline

The talk will cover the following topics:

Lifecycle/Stages in productionizing an ML Model
Incrementally training the models to save on cost and time taken to run the models in production
Underlying platform infrastructure for deploying models to production.
- Model persistence - Versioning of model and data
- Data Lineage
- Orchestrating and Monitoring workflows
Impact of data volume/variety/veracity on the models
Continuous monitoring of the model outputs and its predicted business metrics for accuracy over time
Ease of business use of the model outputs - Reusability and Adaptiveness of generating insights and enabling business decisions
Data Engineering and Data Science collaboration
- Data Engineering to enable data scientists deploy models in production.
- Focus on business value, iterative development and automation

Speaker bio

Srimathi is a software engineer with over 13 years of experience in building products that deliver measurable customer value. At Sahaj, she is currently part of a team building a cost-effective, tera-bytes scale extensible and programmable data platform in the advertising space. She has worked previously with Thoughtworks, Oracle, and Dell.

Comments

NIMHANS Convention Centre, Bangalore, Bengaluru

Hosted by

The Fifth Elephant

The Fifth Elephant 2020 edition

Taming the Data Elephant (aka) Productionizing Data Science!

Outline

Speaker bio

Links

Comments