Making Data Science Work session 1

Setting up Machine Learning (ML) Projects for Success

About

As data science steps out of the lab and into real-world applications, there are a number of challenges from planning to production. Based on where data science is being harnessed - highly regulated industries to fast-moving consumer technology - the context and solutions vary.

In this forum, hosts Venkata Pingali and Indrayudh Ghoshal of Scribble Data converse with data science practitioners. The hosts delve into their viewpoints and war stories for an interesting, insightful take on making data science work for better business impact. As there are no rule books for how to do data science, these experiential conversations help in evaluating choices and approaches, and avoidance of expensive mistakes.

The series is held fortnightly. The second session is on 3 June on Model Productionization.

The first meetup - on 20 May - tackled how to set up Machine Learning (ML) projects for success. Speakers included Goda Ramkumar of Swiggy and Srujana Merugu, a data science consultant + Anthill Inside speaker alumnus. A quick summary of this session is as follows:

  1. The speakers delved into the lifecycle of an ML model, starting from framing the problem to deploying the solution, and how different stakeholders come together.
  2. In doing so, the speakers showed how a central data science team is core for a Series B level startup versus a hub-and-spoke model type data science team in a mature organisation with multiple business units.
  3. Speakers also touched on the importance of iteration and automation for better data quality and better problem framing over time. Finally, since deployment of ML models is not deterministic, pre-deployment, testing and monitoring how data distribution itself changes over time are important.

Key actionable for organizations are:

  1. Link to business objectives and engage stakeholders. Aligning data science to both is important. Active communication is necessary to take the org along on the journey (and it will be a journey).
  2. Iterate fast. Ability to experiment and learn fast using good enough models is critical. Align teams, infrastucture and processes. Actively drive tradeoffs.
  3. Build for automation, standardization, and asset reuse fom get go. This will speed up the process and reduce errors. Lots of effort goes waste.

This summary is compiled by participant Manu Raveendran and Venkata Pingali with copy-editing support from The Fifth Elephant community organizing team.

Participants’ questions from this session, and ongoing discussion continues on https://hasgeek.com/fifthelephant/making-data-science-work/comments

Reading references for this session:

  1. The challenges in productionizing data science.
  2. Data Science Journey for startups: The 5 tweaks that turn Typical to Ideal
  3. Metric Tree Thinking a talk by Jeremy Howard at Strata 2011 edition.
  4. Art of Forumation by The Fifth Elephant speaker alumni, Shailesh Kumar

About the hosts: Scribble Data is a Bangalore/Toronto startup, active in the data community. Scribble implements MLOps for Data using their feature engineering platform, Enrich. This enables data science teams to train models faster, and with confidence in the underlying data.

About the organizers: The Fifth Elephant is platform for practitioners working with data (engineering, to application of data science for different use cases) to showcase their work and to collaborate.

For further inquiries, contact 7676332020 or write to fifthelephant.editorial@hasgeek.com

Host

Scribble Data

Venue

Online