Hack Five For members

The Fifth Elephant Open Source AI Hackathon 2024

GenAI makers and creators contest and showcase

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Hasura, Bangalore

Tickets

Loading…

Overview

The Fifth Elephant Open Source AI Hackathon started on 5 January 2024 and reached its finale with a Demo Day event on 12 April 2024, when the winners of the two month long contest were chosen.

The aim of this hackathon was to encourage individuals/teams to apply and incubate innovative AI ideas/use cases and publish them as open source projects.

  • The hackathon contest participants worked with mentors for over two months to refine their ideas, and advance them to a stage where they are viable projects that could be pursued beyond the hackathon.
  • the project teams worked on AI’s application in education, accessibility, creative expression, scientific research, languages, under the overall theme of AI for India.
  • competing projects were judged on impact and relevance, innovation and creativity, technical soundness and code quality, scope of expansion, reusability and ease of adoption

As a campaign to raise awareness and drive up developer adoption of AI and open source technologies, the hackathon was a great success. It helped shine light on the agility that open source technology enables for creative and innovative developers.

Open Source AI Hackathon Winners

Testimonials

“...each one of the contestants put in tremendous effort. And we saw the passion in every person, trying to do things not for winning, but about really building your projects. After a long time, I am attending such a hackathon where young folks are so passionate about building. Kudos to all of you”.
- Rukma Talwadker, Jury Member, Senior Principal Scientist at Games 24x7

“I really enjoyed judging all the projects - lot of interesting work. The Fifth Elephant has done a great job with mentoring and curating this hackathon”.
- Tanuja Ganu, Jury Member, Principal RSDE Manager, Microsoft India

“The hallmark of this hackathon was getting younger people to code for a longer period of time as opposed to a typical hackathon which turns out to be about — how do you build the coolest thing in the shortest period of time”.
- Sumod Mohan, mentor.

“What is impressive about this particular hackathon is, it is not just about cool ideas and fancy demos. It is actually about building a product or a software or a model that can live beyond the demo (and contest).”
- Soma Dhavala, team member at Project Seshu

“It was only through putting my ideas to code that I learnt what the specificity of implementing these (LLMs) were. I began my journey with a sense of hope and commitment towards FOSS principles, and the Hackathon only reinforced my belief that collaboration maketh a better product.”
- Sankalp Srivastava, Creator of Project Schematise

Key highlights from the hackathon

During the course of 12 weeks, the hackathon involved:

  1. Started off on 5 January 2024 and invited open source ideas and projects.
  2. Mentorship sessions in February for all project teams. Mentors included Abhishek H Mishra aka Tokenbender, Arvind Saraf, Bharat Shetty, Ramesh Hariharan, Sidharth Ramachandran, Simrat Hanspal, Sumod Mohan and Vinayak Hegde.
  3. The 10 best from 40 applications were chosen for the Demo Showcase.
  4. An involved peer-review process helped further refine projects between March 1st - 15th, followed by extensive rehearsals from April 8th - 10th, 2024.
  5. On Demo Showcase Day - we had project demos from 10 qualifying teams; 5 project winners were chosen on 12 April 2024.

The Prizes

🏆 Five prizes of ₹1,00,000 (One lakh rupees) per theme, were awarded to winning projects.
The prizes for this hackathon have been sponsored by Meta.

Note: Apart from the contest prizes, Microsoft has offered internships to the contestants.

Jury

  1. Ashok Hariharan heads data and business intelligence at United Nations Volunteers.
  2. Rukma Talwadker is Senior principal scientist at Games24x7.
  3. Shubha Shedthikere is a Senior Manager in the Data Science team at Swiggy.
  4. Sunil Abraham is the Public Policy Director for Data Economy and Emerging Tech at Meta, India.
  5. Tanuja Ganu is a Principal RSDE Manager at Microsoft Research India.

Mentors

  1. Abhishek Mishra is a is creator of CodeCherryPop LLM series.
  2. Arvind Saraf is a computer scientist, engineering leader, entrepreneur trained at IIT, MIT and Google.
  3. Simrat Hanspal is currently spearheading AI product strategy at Hasura.
  4. Sumod Mohan is the co-founder and CEO of AutoInfer.

Editors

About The Fifth Elephant

The Fifth Elephant is a community of practitioners, who share feedback on data, AI and ML practices in the industry. If you like the work that The Fifth Elephant does and want to support its activities - review of Papers, Books, building the innovation ecosystem in India through hackathons and conferences - contribute by picking up a membership.

Contact

💬 Post a comment with your questions here, or join The Fifth Elephant Telegram group and the WhatsApp group.

Follow @fifthel on Twitter.

📞 For any inquiries, call The Fifth Elephant at +91-7676332020.

sponsor image

Hosted by

The Fifth Elephant hackathons

Supported by

Host

All about data science and machine learning

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Providing all founders, at any stage, with free resources to build a successful startup.

Sankalp Srivastava

Schematise (formerly, "Complianalyse")

Submitted Jan 27, 2024

An LLM enabled XML generator for Indian statutes and laws in the Akoma Ntoso format.

While originally meant as a compliance mapper, this project’s author, guided by the Unix philosophy of “Doing one thing and doing it well” has decided to focus on creating something more modular, rather than focus on a single use-case. Accordingly, repository details have been amended with struck out text where it was only applicable to the previous approach. However, most of the progress made is applicable to the new goal of the project.

https://github.com/sankalpsrv/Schematise/blob/main/README.md

Go to the Dev branch (click here) to view more frequent updates.

Click here to view progress made so far

Problem

Further utilisation of machine readable laws necessitates the conversion of entire statutes into a set of formats known as “Akoma Ntoso”. Currently, there are the following separate solutions locatable on the web/Github/Gitlab that help perform schema generation in different local contexts.

Hence, there is a need for a comprehensive automated solution for Indian laws, that is open-sourced, up-to-date with AI capabilities, and covers both LegalDocML as well as LegalRuleML.

Proposed solution

Considering that LegalRuleML and LegalDocML exist as a solution to encode legal statutes into text, the app shall generate XML output. I will do so by utilising an LLM based approach to generate an entire statute’s XML.

Features

  • Users can upload text files/pdfs and download AkomaNtoso compliant output in XML format.
  • Users will be able to choose either OpenAI, prompt-engineering, RAG, or a fine-tuned model, since each can generate different outputs and have different inference costs/resource requirements.
  • Users will be able to validate the XML generated, as well as browse and query the XML in a web interface, using open-source integrations (discussed in section below)
  • Further modularity is likely to be included as the project goes further.

Progress

Ethical considerations

  • App shall provide a disclaimer before executing and at the generated results in each case regarding the results not constituting legal advice.
  • No user data will be sought or stored in any place. The database integration will store the inference results for each statute.

Resource constraints

I am working on a Cloud GPU when testing “local inferencing” via Llama2, and I am attempting to work with quantised models at this stage. However, OpenAI seems to provide a much more feasible deployment scenario. I will expand on capacities later, if required and for fine-tuning, such as by availing Azure.

For the purposes of app showcase I intend on deploying via Cloud GPU, to the extent possible. Alternatively, will run on OpenAI as it is already integrated in the LangChain workflow.

LLMs being compared

HuggingFace’s Inference API will be made use of, in addition to Azure or any other comparable compute resources provider.

  • Llama (useful because of its Grammars implementation)
    I have been able to generate similar output from Llama’s 13b and 7b models via few-shot prompting.
  • GPT
    I am using GPT3.5 for some idea testing and it has been delivering results consistently so far. I have shared these in my notebook on the GitHub repository

The following models will be considered later, if required

  • Mixtral 7b instruct fine tuned
  • BERT models
    • LegalBert (However, this paper suggests that auto-encoding models perform lesser than autoregressive ones on this task)
    • InLegalBert (shown to perform better on Indian laws)

Open-source integrations

Other relevant details

  • In context learning prompts, datasets for training, and other relevant details, will be shared as they are generated.
  • Integrations will be considered as and when the app proceeds in development. Including PostgreSQL database integration for storing the generated schemas.
  • Limitations for automated Legal to XML markup have been observed here

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Hasura, Bangalore

Hosted by

The Fifth Elephant hackathons

Supported by

Host

All about data science and machine learning

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Providing all founders, at any stage, with free resources to build a successful startup.