The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Srimathi H

@shrimats

Taming the Data Elephant (aka) Productionizing Data Science!

Submitted May 31, 2020

To productionize data science and get actionable insights from raw data, require organizations to efficiently build, operate, and manage complex large scale data platforms. When it comes to productionizing ML models and achieving business value, it is very important to develop models iteratively, test and deploy on top of a robust platform infrastructure.

A fully automated platform enables us to manage the life cycle of ML-models to production with greater reliability/predictability. The platform should be responsible to ensure every commit is deployable in an automated fashion, persistence of the results of the versioned models in an auditable/readable way as well as scheduling and any dependent workflows.

Training of models in production with terabytes of data is costly in terms of training time, cost, failure chances, costs of re-run etc. Incremental training is a way to mitigate some of these costs and have an efficient model in production. In this talk, I will break down how we built a tera-bytes scale extensible and programmable data platform to enable continuous data-driven insights and how we ‘tamed the beast’ to run data science at scale. I will also cover examples of incremental training about how we migrated from a model running over a large training time series dataset, to an incremental model with weekly data.

Outline

The talk will cover the following topics:

  1. Lifecycle/Stages in productionizing an ML Model
  2. Incrementally training the models to save on cost and time taken to run the models in production
  3. Underlying platform infrastructure for deploying models to production.
    • Model persistence - Versioning of model and data
    • Data Lineage
    • Orchestrating and Monitoring workflows
  4. Impact of data volume/variety/veracity on the models
  5. Continuous monitoring of the model outputs and its predicted business metrics for accuracy over time
  6. Ease of business use of the model outputs - Reusability and Adaptiveness of generating insights and enabling business decisions
  7. Data Engineering and Data Science collaboration
    • Data Engineering to enable data scientists deploy models in production.
    • Focus on business value, iterative development and automation

Speaker bio

Srimathi is a software engineer with over 13 years of experience in building products that deliver measurable customer value. At Sahaj, she is currently part of a team building a cost-effective, tera-bytes scale extensible and programmable data platform in the advertising space. She has worked previously with Thoughtworks, Oracle, and Dell.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more