Submissions for MLOps November edition

On ML workflows, tools, automation and running ML in production

This project is accepting submissions for MLOps November conference edition.

The first edition of the MLOps conference was held on 23, 24 and 27 July. Details about the conference including videos and blog posts are published at https://hasgeek.com/fifthelephant/mlops-conference/

Contact information: For inquiries, contact The Fifth Elephant on fifthelephant.editorial@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Sudeep Gupta

@sudeepgupta90

Empowering Data Scientists at Farfetch to GoFar with PaaS

Submitted Jun 29, 2021

Machine Learning is a strategic goal at Farfetch, and enabling Data Scientists to be more productive is a key objective to achieve it. Cloud Providers like AWS, Google, Azure have a lot of services/products which offer various capabilities to create value in ML and Data space. But often the problem with using these services, or building one such Product from scratch is that the key stakeholders - the Data Scientists are ignored. While these products/services (cloud offerings) are capable of hosting/delivering the promised models/analytics, the workflow of a Data Scientist which is integrated with the existing enterprise infrastructure (approvals, security, access, setup) is quite overlooked, and often the onus falls on the User who is solving the problem - making it a tedious job to navigate through the Enterprise tree to figure out what needs to be done to get setup.

As an Azure strategic partner, at Farfetch our objective is to leverage its enterprise services along with cutting edge Open Source technologies and add value for our Data Scientists to enable them to hit the ground running with every problem.
Every process/requirement is captured with the spotlight on Data Scientists. The goal is always - what is the use case, and how do we enable our users to execute it efficiently; and with that (goal) in mind, we are building a Platform as a Service with multi-tenancy at its core to give control back to our users in an Enterprise context, to use/extend the platform as they please.

We identified the requirements and blockers, and have built a ML Platform around them leveraging the components as outlined below:

  1. Workflow Orchestration Layer- Airflow on Kubernetes [7 mins]

    • Architecture
    • Provisioning the Infrastructure through custom Terraform modules
    • Airflow Deployment Pipelines
    • Monitoring Airflow deployment
    • Airflow Dags development - Tests, Integrations
  2. Processing Layer - Databricks [7 mins]

    • User and Access Management
    • Secrets Management
    • Governance
  3. Storage Layer - ADLS [7 mins]

    • Architecture of Data Lake
    • Fine grained Access Control and Governance
    • Use case: Accessing Data from Databricks

Key takeaways

  • ML Platforms as Service
  • Airflow Setup at Enterprise Scale, best practices
  • Decoupling Processing and Storage Layer, and integration with Enterprise

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more