Hack Five For members

The Fifth Elephant Open Source AI Hackathon 2024

GenAI makers and creators contest and showcase

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Hasura, Bangalore

Tickets

Loading…

Overview

The Fifth Elephant Open Source AI Hackathon started on 5 January 2024 and reached its finale with a Demo Day event on 12 April 2024, when the winners of the two month long contest were chosen.

The aim of this hackathon was to encourage individuals/teams to apply and incubate innovative AI ideas/use cases and publish them as open source projects.

  • The hackathon contest participants worked with mentors for over two months to refine their ideas, and advance them to a stage where they are viable projects that could be pursued beyond the hackathon.
  • the project teams worked on AI’s application in education, accessibility, creative expression, scientific research, languages, under the overall theme of AI for India.
  • competing projects were judged on impact and relevance, innovation and creativity, technical soundness and code quality, scope of expansion, reusability and ease of adoption

As a campaign to raise awareness and drive up developer adoption of AI and open source technologies, the hackathon was a great success. It helped shine light on the agility that open source technology enables for creative and innovative developers.

Open Source AI Hackathon Winners

Testimonials

“...each one of the contestants put in tremendous effort. And we saw the passion in every person, trying to do things not for winning, but about really building your projects. After a long time, I am attending such a hackathon where young folks are so passionate about building. Kudos to all of you”.
- Rukma Talwadker, Jury Member, Senior Principal Scientist at Games 24x7

“I really enjoyed judging all the projects - lot of interesting work. The Fifth Elephant has done a great job with mentoring and curating this hackathon”.
- Tanuja Ganu, Jury Member, Principal RSDE Manager, Microsoft India

“The hallmark of this hackathon was getting younger people to code for a longer period of time as opposed to a typical hackathon which turns out to be about — how do you build the coolest thing in the shortest period of time”.
- Sumod Mohan, mentor.

“What is impressive about this particular hackathon is, it is not just about cool ideas and fancy demos. It is actually about building a product or a software or a model that can live beyond the demo (and contest).”
- Soma Dhavala, team member at Project Seshu

“It was only through putting my ideas to code that I learnt what the specificity of implementing these (LLMs) were. I began my journey with a sense of hope and commitment towards FOSS principles, and the Hackathon only reinforced my belief that collaboration maketh a better product.”
- Sankalp Srivastava, Creator of Project Schematise

Key highlights from the hackathon

During the course of 12 weeks, the hackathon involved:

  1. Started off on 5 January 2024 and invited open source ideas and projects.
  2. Mentorship sessions in February for all project teams. Mentors included Abhishek H Mishra aka Tokenbender, Arvind Saraf, Bharat Shetty, Ramesh Hariharan, Sidharth Ramachandran, Simrat Hanspal, Sumod Mohan and Vinayak Hegde.
  3. The 10 best from 40 applications were chosen for the Demo Showcase.
  4. An involved peer-review process helped further refine projects between March 1st - 15th, followed by extensive rehearsals from April 8th - 10th, 2024.
  5. On Demo Showcase Day - we had project demos from 10 qualifying teams; 5 project winners were chosen on 12 April 2024.

The Prizes

🏆 Five prizes of ₹1,00,000 (One lakh rupees) per theme, were awarded to winning projects.
The prizes for this hackathon have been sponsored by Meta.

Note: Apart from the contest prizes, Microsoft has offered internships to the contestants.

Jury

  1. Ashok Hariharan heads data and business intelligence at United Nations Volunteers.
  2. Rukma Talwadker is Senior principal scientist at Games24x7.
  3. Shubha Shedthikere is a Senior Manager in the Data Science team at Swiggy.
  4. Sunil Abraham is the Public Policy Director for Data Economy and Emerging Tech at Meta, India.
  5. Tanuja Ganu is a Principal RSDE Manager at Microsoft Research India.

Mentors

  1. Abhishek Mishra is a is creator of CodeCherryPop LLM series.
  2. Arvind Saraf is a computer scientist, engineering leader, entrepreneur trained at IIT, MIT and Google.
  3. Simrat Hanspal is currently spearheading AI product strategy at Hasura.
  4. Sumod Mohan is the co-founder and CEO of AutoInfer.

Editors

About The Fifth Elephant

The Fifth Elephant is a community of practitioners, who share feedback on data, AI and ML practices in the industry. If you like the work that The Fifth Elephant does and want to support its activities - review of Papers, Books, building the innovation ecosystem in India through hackathons and conferences - contribute by picking up a membership.

Contact

💬 Post a comment with your questions here, or join The Fifth Elephant Telegram group and the WhatsApp group.

Follow @fifthel on Twitter.

📞 For any inquiries, call The Fifth Elephant at +91-7676332020.

sponsor image

Hosted by

The Fifth Elephant hackathons

Supported by

Host

All about data science and machine learning

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Providing all founders, at any stage, with free resources to build a successful startup.

Soma Dhavala

@dhavala

Yashwardhan Chaudhuri

@chaudhuri contributor

Sai Nikhilesh Reddy

@SaiNikhileshReddy contributor

Project Seshu

Submitted Jan 28, 2024

Fedem

A decentralized framework to train foundational models

FedEm is an open-source library empowering community members to actively participate in the training and fine-tuning of foundational models, fostering transparency and equity in AI development. It aims to democratize the process, ensuring inclusivity and collective ownership in model training.

See this presentation


🎥 Demo

Fedem Package

Link to the video

Installation

$ pip install fedem

Introduction

The emergence of ChatGPT captured widespread attention, marking the first instance where individuals outside of technical circles could engage with Generative AI. This watershed moment sparked a surge of interest in cultivating secure applications of foundational models, alongside the exploration of domain-specific or community-driven alternatives to ChatGPT. Notably, the unveiling of LLaMA 2, an LLM generously open-sourced by Meta, catalyzed a plethora of advancements. This release fostered the creation of diverse tasks, tools, and resources, spanning from datasets to novel models and applications. Additionally, the introduction of Phi 2, an SLM by Microsoft, demonstrated that modestly-sized models could rival their larger counterparts, offering a compelling alternative that significantly reduces both training and operational costs.

Yet, amid these strides, challenges persist. The training of foundational models within current paradigms demands substantial GPU resources, presenting a barrier to entry for many eager contributors from the broader community. In light of these obstacles, we advocate for FedEm.

FedEm (Federated Emergence) stands as an open-source library dedicated to decentralizing the training process of foundational models, with a commitment to transparency, responsibility, and equity. By empowering every member of the community to participate in the training and fine-tuning of foundational models, FedEm mitigates the overall computational burden per individual, fostering a more democratic approach to model development. In essence, FedEm epitomizes a paradigm shift, where foundational models are crafted not just for the people, but by the people, ensuring inclusivity and collective ownership throughout the training journey.

FedEm Framework

FedEm proposes a methodology to train a foundational model continuously, utilizing adapters. FedEm can be elaborated in mainly two sections. Decentralization of adapter training using CRFs and large scale updation using continuous pretraining checkpoints.

Continuous Relay Finetuning (CRF)

We introduce the concept of continuous relay finetuning (CRF), which employs parameter-efficient LoRA adapters in a relay-like fashion for training foundational models. In this method, a client conducts local training of an adapter on a specified dataset, followed by its transmission to a cloud server for subsequent download by another client for further finetuning. FedEm ensures the continuous training of adapters, which are subsequently merged with a foundational model to create an updated model. CRF facilitates community engagement throughout the training process and offers a transparent framework for tracking and continuously updating adapters as new data becomes available, thereby enhancing the adaptability and inclusivity of AI development efforts.

Continuous Pretraining(CPT)

The server-side cloud hub exhibits the capability for perpetual training and deployment of refreshed foundational models at specified intervals, such as monthly or daily cycles. Simultaneously, the CRF adapters engage in iterative refinement against these newly updated models, fostering continual adaptation in response to evolving datasets.

Selective locking of adapters

For continuous relay finetuning, It is important to schedule the adapter training in a fashion that no two clients have the same adapter for training at one point of time. To ensure this access control, we use a time-dependent adapter scheduling. A client downloads an adapter at time T. The adapter will get locked for any other client i.e. cannot be finetuned till one client does not stop finetuning of that adapter. The hub checks periodically in every 5 minutes for access control of adapters. The adapter gets unlocked if any of the following conditions are met:

  • time elapsed for finetuning adapter A > 3 hours.
  • client pushes the finetuned adapters before 3 hours.

Seshu

Majority, if not all the LLMs, we see today are based on proven Transformer based architectures. And Transfomres have quadratic (in inputs tokens) complexity - therefore slow to train and infer. As a result, new memory and compute efficient attention mechanisms have sprungup, along with Engineering hacks. But, at the end of the day, they are still based on Transformer-based architectures.

Further, majority, with the exception of some Chinese LLMs, are English-centric and other languages have a token representation (no pun intended). Often, LLMs have a particulalr tokenizer -- which makes extension to other languages/ domains hard. Vocabulary size and Transfomers Computational Efficiency have an uneasy relationship. Developing SLMs or LLMs is still a compute heavy problem. Therefore, only large consortia with deep pockets, massive talent concentration and GPU farms can afford to build such models.

Client side:

Pre-reqs

  • has GPU, registers on HuggingFace/mlsquare for write access
  • familair with HuggingFace ecosystem (transfomers, peft, datasets, hub)
  • [optional] can donate time or data or both

Actions:

Runs client side script which

  • downloads data, pretrains model
  • SFTs via LoRA
  • pushes the adapter to HuggingFace model hub

Server side (HF Admin Only):

Pre-reqs

  • has (big) GPU(s)
  • is familair with HuggingFace ecosystem (transfomers, peft, datasets, hub), databases, ML Enginneering in general
  • [optional] can donate time or data or both

Actions:

  • Pretrains a multi-lingual Mamba model, publishes a checkpoint
  • Evaluated the community contributed adapters in a single-blind fashion, and merges them into the pretrained model
  • Does continous pretrainning, and releases checkpoints periodically

Academic Interests

  • experiment and identify good federating learning policies
  • figure out effective training configurations to PT, CPT, SFT, FedT SLMs and LLMs
  • develop new task specific adapters
  • contribute your local, vernacular data
  • curate datasets

🫶 Contributions:

Fedem is an open-source project, and contributions are welcome. If you want to contribute, you can create new features, fix bugs, or improve the infrastructure. Please refer to the CONTRIBUTING.md file in the repository for more information on how to contribute.

The views expressed or approach being taken - is of the individuals, and they do not represent any organization explicitly or implicitly.
Likewise, anyone who wants to contribute their time, compute or data must understand that, this is a community experiment to develop LLMs by the community, and may not result in any significant outcome. On the contrary, this may end up in total failure. The contributors must take this risk on their own.

To see how to contribute, visit Contribution guidelines

Initial Contributors: @dhavala,
@yashwardhanchaudhuri, &
@SaiNikhileshReddy

Roadmap

Week 0

  • Make Mamba compatiable with Transformer class
  • Test LoRA adapters (adding, training, merging)
  • Pretrain an SLM, SFT on LoRA, Merge, Push

Outcome: A working end-to-end Pretraining and SFT-ing pipeline[DONE]

Week 1

  • Develop client-side code
  • On multi-lingual indic dataset such as samantar, pretrain a model

Outcome: Release a checkpoint[DONE]

Week 2

  • Drive SFT via community (at least two users)
  • Run Federated SFT-ing

Week 4 and onwards

  • Benchmark and eval on test set (against other OSS LLMS)
  • Perplexity vs Epochs (and how Seshu is maturing)

References:

Architectures and Tokenizers

  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces, paper, 1st, Dec, 2023
  • MambaByte: Token-free Selective State Space Model paper, 24th, Jan, 2024
  • BlackMamba - Mixture of Experts of State Space Models paper, code
  • ByT5: Towards a token-free future with pre-trained byte-to-byte models paper, 28th, May, 2023

Indic LLMs

  • RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization paper, 25th, Jan, 2024
  • Open Hathi - blog from sarvam.ai, 12th Dec, 2023
  • MaLA-500: Massive Language Adaptation of Large Language Models paper

Model Merging

  • Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time paper
  • Editing Models with Task Arithmetic paper
  • TIES-Merging: Resolving Interference When Merging Models paper
  • Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch paper
  • Rethinking Alignment via In-Context Learning (implments token distribution shift) blog
  • PhatGoose: The Challenge of Recycling PEFT Modules for Zero-Shot Generalization blog
  • LoRA Hub: Efficient Cross-Task Generalization via Dynamic LoRA Composition (https://arxiv.org/abs/2307.13269)

Datasets

Code & Tools for Models, and Distributed Training
  • Mamba-HF: Mamba model compatible with HuggingFace transformers here
  • Mamba pretrained model collection here
  • Mamba-minimal: a minimal implementaiton of Mamba architecture here
  • Mamba: original implementation by Mama authors here
  • OLMo: A truly open LLM blog
  • Petals: decentralized inference and finetuning of large language models blog, paper, git repo
  • position blog on Petals: a shift in training LLMs with Petals network techcrunch blog
  • Shepherd: A Platform Supporting Federated Instruction Tuning here
  • FATE-LM is a framework to support federated learning for large language models(LLMs) here
  • FEDML Open Source: A Unified and Scalable Machine Learning Library for Running Training and Deployment Anywhere at Any Scale here
  • mergekit for model merign to implement multiple model merging techiques here

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Hasura, Bangalore

Hosted by

The Fifth Elephant hackathons

Supported by

Host

All about data science and machine learning

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Providing all founders, at any stage, with free resources to build a successful startup.