Open Source AI Hackathon

Open Source AI Hackathon

The Fifth Elephant Winter Edition Hackathon

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Microsoft Reactor Bengaluru, Bengaluru

About the hackathon

The aim of this hackathon is to encourage individuals/teams to apply and develop innovative AI ideas/use cases and publish them as open source projects.

Who can participate

  1. Working professionals
  2. Students
  3. Independent consultants
  4. AI researchers
  5. ML engineers
  6. Lawyers, doctors, agronomists, artists, and others who are keen to collaborate with technologists, and showcase ideas and working demos.

Criteria for submitting projects

  1. Ideas should be open source.
  2. Code should be open source with a permissive open-source LICENSE file added.
  3. Orchestrate your code in such a way that it works with open-source models (pre-trained and fine-tuned), open-source products, platforms, systems, and tools.

How to participate

  1. Submit your project idea and outline here.
  2. Join The Fifth Elephant WhatsApp group to discuss your submission with the mentors.
    Or, if you want to validate your idea/project before submitting it, you can discuss it with the mentors, either in the WhatsApp group or on DM.
  3. Participants should work on their projects and start building soon after submitting ideas. Participants have the entire month of February to work on their projects. The last date for submitting projects is 28 February.
  4. Mentors will be assigned to projects which are shortlisted. Inactive projects, or projects that are not in the consideration list will not be assigned mentors.
  5. Mentors will comment on the submissions during the period - all through Febryart. The reward of the hackathon is the feedback, not just the cash prize.
  6. Demo day for all shortlisted hackathon projects — in person and remote — will be on 10 March. The jury will review the submissions and announce prize winners.

Mentors

  • Bharat Shetty is an AI/ML Consultant. He has worked for Airtel Labs and other organizations on AI/ML/NLP platforms and products, across diverse verticals such as conversational AI, EdTech, IOT, and healthcare. Bharat is the editor of The Fifth Elephant Winter edition, and papers discussion community.

  • Abhishek Mishra is a is creator of CodeCherryPop LLM series.

  • Aniket Maurya is spearheading the creation of intelligent software using AI, serving as a Developer Advocate at Lightning AI ⚡️, and is the creator of GradsFlow.

  • Simrat Hanspal has a career spanning over a decade in the AI ML space, specializing in Natural Language Processing. Currently spearheading AI product strategy at Hasura and has led AI teams at renowned organizations such as VMware, FI Money, and Nirvana Insurance in the past.

  • Sumod Mohan is the co-founder and C.E.O of stealth startup AutoInfer Private limited. He is also technical Advisor and previously CTO of Niqo Robotics where he helped build robots to remove weeds from agricultural farms. This work won the Ministry of Electronics and Information technology (MeitY) and Niti Ayog’s RAISE 2020 Challenge in the Agriculture sector. He was an Advisor to WebCardio, AI based Holter manufacturer (wearable ECG) and led the Computer Vision Division at Soliton Technologies. He was also CTO of Digital Aristotle, which was acquired by Byjus. He has over 15 years of research experience in Computer Vision and over 10 in productizing these technologies in the US and India. Prior to this he worked for HighlightCam Inc, a startup in California where he led Computer Vision Algorithm Development. He holds an M.S degree from Clemson University, USA with a specialization in Intelligent Systems and Robotics.

Editors

  • Bharat Shetty is an AI/ML Consultant. He has worked for Airtel Labs and other organizations on AI/ML/NLP platforms and products, across diverse verticals such as conversational AI, EdTech, IOT, and healthcare. Bharat is the editor of The Fifth Elephant Winter edition, and papers discussion community.
  • Akshobhya Jamadagni is Editorial Assistant for The Fifth Elephant Open Source AI Hackathon. He is passionate about contributing value across various levels of abstraction, from high-level technical strategy to detailed implementation.

Team composition

  1. You can submit your project as an individual.
  2. Team size is restricted to a maximum of 3 members.
  3. Add your teammates as collaborators after submitting your idea.

Ideas for the hackathon

Participants can propose projects around some of the following ideas:

  1. AI for Scientific Research: e.g. Protein folding models, climate models, drug discovery, image recognition for scientific research, simulations for material science, epidemiology, and more.
  2. AI for inclusivity and accessibility: e.g. STT/TTS, automated audio descriptions (for non-voice content), automated color blindness correction, AI-powered sign language generation, real-time AI-powered captioning display for events, educational resources, and content translation across languages by leveraging multi-lingual models, adaptive content for differences in learning ability and/or neurodivergence, etc.
  3. AI and creative expression: e.g., generative audio, video, text, and visuals and ways to combine these in a production-oriented direction, including AR/VR/Gaming and OTT implementations.
  4. AI in education: e.g., personalized learning plans, adaptive learning plans, content creation, translation with context, AI tutors, productivity tools, well-being improvement tools, etc.
  5. AI for India: for e.g., India-specific law, models that focus on indic languages, renewable energy optimization, disaster response and relief, and education accessibility.
  6. Additionally, participants can also pick and work on ideas from the list of ideas submitted in this spreadsheet.

Jury - to be announced

Project Evaluation Criteria

Project Evaluation Criteria Presentation

Prizes

Five prizes of ₹1,00,000 (One lakh rupees) per theme, will be awarded to winners at the hackathon.

About The Fifth Elephant

The Fifth Elephant is a community funded organization. If you like the work that The Fifth Elephant does and want to support meet-ups and activities - online and in-person - contribute by picking up a membership

Contact information

If you have questions about hackathon, post a comment here, or join The Fifth Elephant Telegram group and the WhatsApp group.

Follow @fifthel on Twitter.

For any inquiries, call The Fifth Elephant at +91-7676332020.

Sponsored by Meta

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Partner

Microsoft for Startups Founders Hub is a digital ecosystem removing barriers to building a company with free access to technology, coaching, and support for founders in any stage of development. Let us accelerate your startup journey from idea-to-exit. Find out more here: https://startups.microsoft… more

Sankalp Srivastava

Schematise (formerly, "Complianalyse")

Submitted Jan 27, 2024

An LLM enabled XML generator for Indian statutes and laws in the Akoma Ntoso format.

While originally meant as a compliance mapper, this project’s author, guided by the Unix philosophy of “Doing one thing and doing it well” has decided to focus on creating something more modular, rather than focus on a single use-case. Accordingly, repository details have been amended with struck out text where it was only applicable to the previous approach. However, most of the progress made is applicable to the new goal of the project.

https://github.com/sankalpsrv/Schematise/blob/main/README.md

Click here to view progress made so far

Problem

Further utilisation of machine readable laws necessitates the conversion of entire statutes into a set of formats known as “Akoma Ntoso”. Currently, there are the following separate solutions locatable on the web/Github/Gitlab that help perform schema generation in different local contexts.

Hence, there is a need for a comprehensive automated solution for Indian laws, that is open-sourced, up-to-date with AI capabilities, and covers both LegalDocML as well as LegalRuleML.

Proposed solution

Considering that LegalRuleML and LegalDocML exist as a solution to encode legal statutes into text, the app shall generate XML output. I will do so by utilising an LLM based approach to generate an entire statute’s XML.

Features

  • Users can upload text files/pdfs and download AkomaNtoso compliant output in XML format.
  • Users will be able to choose either OpenAI, prompt-engineering, RAG, or a fine-tuned model, since each can generate different outputs and have different inference costs/resource requirements.
  • Users will be able to validate the XML generated, as well as browse and query the XML in a web interface, using open-source integrations (discussed in section below)
  • Further modularity is likely to be included as the project goes further.

Progress

Flowchart with progress

Updated as of 21st February

Image

Ethical considerations

  • App shall provide a disclaimer before executing and at the generated results in each case regarding the results not constituting legal advice.
  • No user data will be sought or stored in any place. The database integration will store the inference results for each statute.

Resource constraints

I am working on a Cloud CPU notebook and I am attempting to work with quantised models at this stage. I will expand on capacities later, if required and for fine-tuning.

For the purposes of app showcase I intend on deploying via Cloud GPU, to the extent possible.

LLMs being compared

HuggingFace’s Inference API will be made use of, in addition to Azure or any other comparable compute resources provider.

  • Llama (useful because of its Grammars implementation)
    I have been able to generate similar output from Llama’s 13b and 7b models via few-shot prompting.
  • GPT
    I am using GPT3.5 for some idea testing and it has been delivering results consistently so far. I have shared these in my notebook on the GitHub repository

The following models will be considered later, if required

  • Mixtral 7b instruct fine tuned
  • BERT models
    • LegalBert (However, this paper suggests that auto-encoding models perform lesser than autoregressive ones on this task)
    • InLegalBert (shown to perform better on Indian laws)

Open-source integrations

Other relevant details

  • In context learning prompts, datasets for training, and other relevant details, will be shared as they are generated.
  • Integrations will be considered as and when the app proceeds in development. Including PostgreSQL database integration for storing the generated schemas.
  • Limitations for automated Legal to XML markup have been observed here

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Feb 2024, 11:00 PM

Microsoft Reactor Bengaluru, Bengaluru

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Partner

Microsoft for Startups Founders Hub is a digital ecosystem removing barriers to building a company with free access to technology, coaching, and support for founders in any stage of development. Let us accelerate your startup journey from idea-to-exit. Find out more here: https://startups.microsoft… more