Setting up Machine Learning (ML) Projects for Success

Making Data Science Work

Despite growing teams and budgets, very few Machine Learning (ML) models reach the production stage. What is productionization and what makes it difficult? How does it relate to MLOps? How should one plan for production ML? What kinds of systems and process should be in place for ensuring successful delivery? What does the team structure and skillsets look like?

In this forum, hosts Venkata Pingali and Indrayudh Ghoshal of Scribble Data converse with data science practitioners. The hosts delve into their viewpoints and war stories for an interesting, insightful take on making data science work for better business impact. As there are no rule books for how to do data science, these experiential conversations help in evaluating choices and approaches, and avoidance of expensive mistakes.

Summary of the session:

This discussion tackled how to set up Machine Learning (ML) projects for success. Speakers included Goda Ramkumar of Swiggy and Srujana Merugu, a data science consultant + Anthill Inside speaker alumnus. A quick summary of this session is as follows:

  1. The speakers delved into the lifecycle of an ML model, starting from framing the problem to deploying the solution, and how different stakeholders come together.
  2. In doing so, the speakers showed how a central data science team is core for a Series B level startup versus a hub-and-spoke model type data science team in a mature organisation with multiple business units.
  3. Speakers also touched on the importance of iteration and automation for better data quality and better problem framing over time. Finally, since deployment of ML models is not deterministic, pre-deployment, testing and monitoring how data distribution itself changes over time are important.

Key actionable for organizations are:

  1. Link to business objectives and engage stakeholders. Aligning data science to both is important. Active communication is necessary to take the org along on the journey (and it will be a journey).
  2. Iterate fast. Ability to experiment and learn fast using good enough models is critical. Align teams, infrastucture and processes. Actively drive tradeoffs.
  3. Build for automation, standardization, and asset reuse fom get go. This will speed up the process and reduce errors. Lots of effort goes waste.

This summary is compiled by participant Manu Raveendran and Venkata Pingali with copy-editing support from The Fifth Elephant team.

Participants’ questions from this session, and ongoing discussion continues on https://hasgeek.com/fifthelephant/making-data-science-work/comments

Reading references for this session:

  1. The challenges in productionizing data science.
  2. Data Science Journey for startups: The 5 tweaks that turn Typical to Ideal
  3. Metric Tree Thinking a talk by Jeremy Howard at Strata 2011 edition.
  4. Art of Forumation by The Fifth Elephant speaker alumni, Shailesh Kumar

About the curators: Venkata Pingali and Indrayudh Ghoshal of Scribble Data have curated this session. Scribble Data is a Bangalore/Toronto startup, active in the data community.

About the series producer: The Fifth Elephant is platform for practitioners working with data (engineering, to application of data science for different use cases) to showcase their work and to collaborate.

For further inquiries, contact 7676332020 or write to fifthelephant.editorial@hasgeek.com

Venue

Online