Teamwork on a Machine Learning project that scales
A Machine Learning project is composed of a variety of artifacts that are distinguished from one another. When a project evolves and grows in complexity, this fact becomes a significant challenge in our workflow with multiple aspects, such as:
- Manage and synchronize the project’s components.
- Divide the tasks between team members.
- Onboard new data scientists to the project.
- Reproduce results of previous experiments.
Unlike Software Development, where Git-Flow has become a cross-disciplinary convention, such workflow has not yet been established in the field of Data Science. As a result, many teams create ad-hoc solutions to overcome the above challenges. However, in most cases those solutions are not scalable nor generic enough and therefore, stay only within the scope of the team.
In this session, we will analyze the challenges when working in a team on a data science project. We will explore the elements of a desirable solution and use them to define a scalable workflow. Last, we will learn the usage of powerful open-source tools (e.g., Git, DVC, MLflow) that will help us implement the workflow.