Open Source AI Hackathon 2024

GenAI makers and creators contest and showcase

Tickets

Loading…

Adithya S K

@Adithya_S_K

Srinidhi Somayaji P

@Srinidhi9113

Achala Nayak

@achalanayak

A versatile open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

Submitted Feb 15, 2024

Problem Statement:
Training Large Language Models (LLMs) for Indic languages from scratch is costly and impractical. In response, we present a streamlined framework for adapting pre-trained LLMs like Llama and Mixtral8x7b to various languages, utilizing a compact dataset for cross-lingual tasks. Our solution includes fine-tuning and evaluation processes tailored for practical production use cases.

Unique Selling Points (USPs):

  • Mixture of Languages Architecture: Introducing a novel architecture inspired by the “Mixture of Experts” framework in Mixtral8x7b. Our model consists of 5x7b parameter models, each serving as an expert in a specific language (Kannada, Telugu, Tamil, Hindi, and English).

  • High-Quality Synthetic Data: The model is trained on high-quality synthetic data, ensuring efficiency and reducing additional training costs.

  • Adaptive Lora Adapter Swapping: Employing a method to dynamically switch Lora adapters during inference, enabling a single model to excel in various tasks such as RAG Answering, translation, and instruction following.

  • Multilingual Support: The model is designed to be multilingual, proficient in five languages, catering to diverse linguistic requirements.

  • Indic LLM Evaluation Framework: Developed a specialized evaluation framework tailored for assessing the performance of Indic Large Language Models.

Model Architecture:
The proposed architecture draws inspiration from the Mixture of Experts framework, where each expert is bilingually trained in a specific language. This approach significantly reduces inference time, making it conducive to production environments. The dynamic switching of Lora adapters during inference is based on specific use cases, ensuring adaptability for tasks like retail support conversations and translation. Note that the training of other models is currently in progress.

Project Goals:

  1. User-Friendly Interface: Develop a straightforward interface to empower individuals in adapting models to different domains and languages. The inclusion of a graphical user interface (GUI) ensures easy accessibility, making the adaptation process user-friendly.

  2. Cutting-Edge Support: Incorporate the latest advancements in distributed training code, dataset generation, translation code, and all necessary components for seamless adaptation, fine-tuning, evaluation, and deployment of models. This ensures that the framework stays at the forefront of technology, providing users with state-of-the-art tools for their language model adaptation needs.

Comments

Login to leave a comment

  • A

    Akshobhya

    @akshobhya_j Editor & Promoter

    Hello @Adithya_S_K, @Srinidhi9113, @achalanayak. Thanks for submitting your project to The Fifth Elephant Open Source AI Hackathon. Can you update your submission based on the following considerations?

    Mixture of Languages Architecture

    1. Lack of detailed explanation on how the novel architecture is inspired by the "Mixture of Experts" framework in Mixtral8x7b may raise questions about the uniqueness and feasibility of the approach.
    2. The feasibility and effectiveness of using 5x7b parameter models as language experts need to be demonstrated and validated through empirical evidence.

    High-Quality Synthetic Data

    The efficacy and reliability of training LLMs on solely synthetic data, especially for languages with complex linguistic nuances, may be met with skepticism and requires thorough validation and justification.

    Adaptive Lora Adapter Swapping

    1. The technical implementation and real-world performance of dynamic Lora adapter swapping during inference need to be thoroughly tested and verified for its seamless integration with various tasks such as RAG Answering, translation, and instruction following.
    2. Ensuring that the adaptability achieved through adapter swapping does not compromise the model's performance or introduce unexpected errors is a critical technical concern.

    Indic LLM Evaluation Framework

    The specificity of the evaluation framework tailored for Indic Large Language Models raises questions about its adaptability and generalizability to other language models or cross-lingual tasks.

    Model Architecture

    1. The details regarding the bilingual training of each expert in a specific language within the proposed architecture need to be elaborated to provide a clear understanding of the technical implementation and its effectiveness.
    2. The ongoing training of other models introduces uncertainty regarding the completeness and robustness of the proposed architecture in its current state.

    Project Goals

    1. The technical challenges associated with developing a user-friendly interface, especially concerning the management of complex adaptation processes involving advanced LLM models, need to be addressed to ensure practical implementation.
    2. Incorporating the latest advancements in distributed training code, dataset generation, and translation code raises concerns about the integration, compatibility, and potential trade-offs of different cutting-edge components within the framework.

    Can you also add your Git repository link and a detailed roadmap of the project?

    Posted 1 year ago
Hybrid access (members only)

Hosted by

The Fifth Elephant hackathons

Supported by

Host

Jump starting better data engineering and AI futures

Venue host

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Providing all founders, at any stage, with free resources to build a successful startup.