Jupyter to Jupiter : Scaling multi-tenant ML Pipelines

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Tickets

All submissions

Previous Next

This submission has been added to the schedule

Jupyter to Jupiter : Scaling multi-tenant ML Pipelines

Submitted May 22, 2021

A brief talk summarising the journey of an ML feature from a Jupyter Notebook to production. At Freshworks, given the diverse pool of customers using our products, each feature has dedicated models for each account, churning out millions of predictions every hour. This talk shall encompass the different tools and measures we’ve used to scale our ML products. Additionally, I’ll also be touching upon Apache Airflow, a workflow management platform and how we’ve used it to automate and parallelise various segments of our ML pipeline.

Outline

Introduction
Why Scale
To serve predictions (real-time & batch-wise)
To improve availability & adherence to SLA
To enable more customers to use ML models
To automate workflows
Challenges of a multi-tenant ML pipeline
Gathering data from different sources (in different regions)
Customizing feature engineering to handle diverse customer base… yet maintaining a unified pipeline
Training accurate models that also generalize well and replicating results across diverse customers
Building data and engineering pipelines to onboard customers with ease
Maintaining, monitoring and orchestrating models and workflows
Data Ingestion and preprocessing
Streamlining Data from different sources
Scaling & Optimising Data Pipelines
Model Training, Evaluation and Deployment
Scaling Offline ETL
Training and Evaluation models and metrics
Data and Model Versioning
Building training workflows
Model Customisation
Prediction, Back-filling and Interpretability
Scaling prediction pipelines
Logging and Monitoring
Product-specific insights
Testing
Drifts and Decay

Key Takaways :

Building a multi-tenant ML pipeline to serve a diverse user base
Tips, hacks and practises for scaling different parts of an ML pipeline
Leveraging Airflow to accomplish the above
Potential bottlenecks and issues one might face while solving the above

Links

All submissions

Previous Next

Comments

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

Scribble Data

Scribble Data builds feature stores for data science teams that are serious about putting models (ML, or even sub-ML) into production. The ability to systematically transform data is the single biggest determinant of how well these models do. Scribble Data streamlines the feature engineering proces… more

Promoted

Privacy Mode

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more

MLOps Conference

Jupyter to Jupiter : Scaling multi-tenant ML Pipelines

Outline

Key Takaways :

Links

Comments