Anthill Inside 2019

A conference on AI and Deep Learning

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

About the 2019 edition:

The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule

The conference has three tracks:

  1. Talks in the main conference hall track
  2. Poster sessions featuring novel ideas and projects in the poster session track
  3. Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:
    - Myths and realities of labelling datasets for Deep Learning.
    - Practical experience with using Knowledge Graphs for different use cases.
    - Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
    - Pros and cons of using custom and open source tooling for AI/DL/ML.

Who should attend Anthill Inside:

Anthill Inside is a platform for:

  1. Data scientists
  2. AI, DL and ML engineers
  3. Cloud providers
  4. Companies which make tooling for AI, ML and Deep Learning
  5. Companies working with NLP and Computer Vision who want to share their work and learnings with the community

For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to sales@hasgeek.com


Sponsors:

Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.


Anthill Inside 2019 sponsors:


Bronze Sponsor

iMerit Impetus

Community Sponsor

GO-JEK iPropal
LightSpeed Semantics3
Google Tact.AI
Amex

Hosted by

Anthill Inside is a forum for conversations about Artificial Intelligence and Deep Learning, including: Tools Techniques Approaches for integrating AI and Deep Learning in products and businesses. Engineering for AI. more

Kurian Benoy

@kurianbenoy

Machine Learning Model and Dataset Versioning

Submitted Jul 18, 2019

In this talk we will discuss about the current practices of organizing ML projects using traditional open-source tool set like Git and Git-LFS as well as this tool set limitation.
Thereby motivation for developing new ML specific version control systems will be explained.

Currently the life-cycle of any Machine learning model goes through following process:

  • a ML practitioner tries out new image classification algorithm with input dataset
  • He tweaks algorithms, tries other ideas and fix bugs. All in local system
  • Some of her training data might require long runs, and may change code while weights remains same
  • She keeps around the model weights and evaluation scores for all her runs, and picks which weights to release as the final model once she’s out of time to run more experiments.
  • She publishes her results, with code and the trained weights.

Git can’t handle large amount of data of GB’s of size. While Git-LFS comes with the in-build difficulty of supporting only 2 GBs of data at the maximum.(github limitations)

Data Version Control or DVC.ORG is an open source, command-line tool written in Python. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favorite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects.

Talk Outline

  • Why ML is different?
  • Problems on using git
  • About using MLFlow
  • Problems with git-LFS
  • Solving machine reproducibility crisis using DVC
  • Creating robust pipelines and rapid experimentation using DVC
  • How can DVC be added to your workflow
  • Conclusion

Outline

Talk Outline

  • Why ML is different?
  • Problems on using git
  • About using MLFlow
  • Problems with git-LFS
  • Solving machine reproducibility crisis using DVC
  • Creating robust pipelines and rapid experimentation using DVC
  • How can DVC be added to your workflow
  • Conclusion

Speaker bio

Kurian Benoy is an open source contributor at CloudCV, DVC. He is the lead organiser of School of AI, Kochi and is an AI enthusiast working on Deep Learning and Computer Vision. Kurian is FOSSASIA Open TechNights WInner and gave a talk in FOSSASIA Open Tech submit about the [keralarescue.in team] (https://www.youtube.com/watch?v=2RzImb5JwMA).

Kurian has been contributing to DVC for the past few months and has been a top 10 contributor to DVC.org and made an introductory kaggle kernel about dvc

Links

Slides

https://docs.google.com/presentation/d/16mbu71NqNH6ULPJWSMDheYwolRrIn1sSLD8JYy9s4ks/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

Hosted by

Anthill Inside is a forum for conversations about Artificial Intelligence and Deep Learning, including: Tools Techniques Approaches for integrating AI and Deep Learning in products and businesses. Engineering for AI. more