A brief talk summarising the journey of an ML feature from a Jupyter Notebook to production. At Freshworks, given the diverse pool of customers using our products, each feature has dedicated models for each account, churning out millions of predictions every hour. This talk shall encompass the different tools and measures we’ve used to scale our ML products. Additionally, I’ll also be touching upon Apache Airflow, a workflow management platform and how we’ve used it to automate and parallelise various segments of our ML pipeline.
- Introduction
- Why Scale
- To serve predictions (real-time & batch-wise)
- To improve availability & adherence to SLA
- To enable more customers to use ML models
- To automate workflows
- Challenges of a multi-tenant ML pipeline
- Gathering data from different sources (in different regions)
- Customizing feature engineering to handle diverse customer base… yet maintaining a unified pipeline
- Training accurate models that also generalize well and replicating results across diverse customers
- Building data and engineering pipelines to onboard customers with ease
- Maintaining, monitoring and orchestrating models and workflows
- Data Ingestion and preprocessing
- Streamlining Data from different sources
- Scaling & Optimising Data Pipelines
- Model Training, Evaluation and Deployment
- Scaling Offline ETL
- Training and Evaluation models and metrics
- Data and Model Versioning
- Building training workflows
- Model Customisation
- Prediction, Back-filling and Interpretability
- Scaling prediction pipelines
- Logging and Monitoring
- Product-specific insights
- Testing
- Drifts and Decay
- Building a multi-tenant ML pipeline to serve a diverse user base
- Tips, hacks and practises for scaling different parts of an ML pipeline
- Leveraging Airflow to accomplish the above
- Potential bottlenecks and issues one might face while solving the above
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}