MLOps Conference

MLOps Conference

On DataOps, productionizing ML models, and running experiments at scale.

Tickets

Loading…

Vishal Gupta

@vizgupta

Jupyter to Jupiter : Scaling multi-tenant ML Pipelines

Submitted May 22, 2021

A brief talk summarising the journey of an ML feature from a Jupyter Notebook to production. At Freshworks, given the diverse pool of customers using our products, each feature has dedicated models for each account, churning out millions of predictions every hour. This talk shall encompass the different tools and measures we’ve used to scale our ML products. Additionally, I’ll also be touching upon Apache Airflow, a workflow management platform and how we’ve used it to automate and parallelise various segments of our ML pipeline.

Outline

  1. Introduction [3 minutes]
    a. About myself
    b. Challenges of a multi-tenant ML pipeline
    c. Role of a Data Scientist vs an ML Engineer
  2. Incentives to scale your pipeline [5 minutes]
    a. Reducing turnaround time (real-time vs batch-wise)
    b. Increasing availability & adhering to SLA
    c. Enabling diverse customers to use your ML features
    d. Automating workflows
  3. Brief Intro to Airflow [5 minutes]
    a. Why not cron?
    b. DAGs, Tasks and Operators
    c. Executors: LocalExecutor, CeleryExecutor, KubernetesExecutor
    d. Controls: Task Pools, Queues and Scheduling rules, Parallelism, etc.
    e. Reasons to avoid Airflow
  4. Scaling different parts of an ML pipeline [15 minutes]
    1. Data Ingestion and preprocessing
      a. Data pipelines at Freshworks
      b. Aggregating different types of data from different sources (be it streams, databases or S3)
      c. One datastore may not work all types of data
      d. Fetching at different intervals, without adding too much load
      e. Cleaning data before insertion to optimise storage
      f. Optimising preprocessing layers to adapt to the rate of incoming data
    2. Model Training, Evaluation and Deployment
      a. Offline ML platform & workflows at Freshworks
      b. Periodically training model to adapt to recent data
      c. Including customer-specific rules and features
      d. Hyper-parameter tuning
      e. Leveraging spark clusters to train faster
      f. Evaluating models and monitor metrics over time
      g. Maintaining model versioning to revert to older versions as a fallback
    3. Prediction, Back-filling and Interpretability
      a. Online ML platform & workflows at Freshworks
      b. Should be capable of scaling to handle more customers
      c. Avoid single point of failure with distributed execution
      d. Establishing back-filling pipelines if historic predictions are of importance
      e. Capturing and handling errors without disrupting the entire workflow
      f. Setting up alerts to identify engineering and data science anomalies
      g. Provide interpretable insights to justify predictions to stakeholders
    4. Misc. engineering practices
      a. Planning before execution : Be it a new module or picking a tool.
      b. Functional testing : Ensuring offline and online pipelines are on par
      c. Application Security : Build data pipelines keeping regulations in mind
      d. Documentation : Add docstrings, setup & deployment instructions and an elaborate README

Key Takaways :

  1. Building a multi-tenant ML pipeline to serve a diverse user base
  2. Tips, hacks and practises for scaling different parts of an ML pipeline
  3. Leveraging Airflow to accomplish the above
  4. Potential bottlenecks and issues one might face while solving the above

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}