Multilingual Mixture of Experts for Domain Adaptive Pre-training of Large Language Models

Dec 2023

18 Mon

19 Tue 05:30 PM – 06:30 PM IST

20 Wed

21 Thu

22 Fri

23 Sat

24 Sun

Jan 2024

1 Mon

2 Tue

3 Wed

4 Thu

5 Fri 05:30 PM – 07:20 PM IST

6 Sat

7 Sun

Jan 2024

8 Mon 06:00 PM – 06:55 PM IST

9 Tue

10 Wed 06:00 PM – 07:00 PM IST

11 Thu

12 Fri 06:00 PM – 07:30 PM IST

13 Sat 03:00 PM – 06:00 PM IST

14 Sun

Jan 2024

22 Mon

23 Tue

24 Wed

25 Thu

26 Fri

27 Sat 05:00 PM – 05:45 PM IST

28 Sun

Feb 2024

29 Mon

30 Tue

31 Wed

1 Thu

2 Fri

3 Sat 10:00 AM – 06:25 PM IST

4 Sun

Feb 2024

5 Mon

6 Tue

7 Wed 08:15 PM – 09:00 PM IST

8 Thu

9 Fri

10 Sat

11 Sun

Feb 2024

12 Mon 08:15 PM – 09:00 PM IST

13 Tue 08:15 PM – 09:00 PM IST

14 Wed 08:15 PM – 09:00 PM IST

15 Thu 08:15 PM – 09:00 PM IST

16 Fri 07:30 PM – 08:30 PM IST

17 Sat 08:15 PM – 09:00 PM IST

18 Sun

Feb 2024

19 Mon

20 Tue

21 Wed 08:30 PM – 09:15 PM IST

22 Thu

23 Fri

24 Sat

25 Sun

Mar 2024

4 Mon

5 Tue

6 Wed

7 Thu

8 Fri

9 Sat 07:00 PM – 09:00 PM IST

10 Sun 04:00 PM – 06:00 PM IST

Apr 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri 12:00 PM – 06:25 PM IST

13 Sat

14 Sun

Hasura, Bangalore

A

Akshobhya

@akshobhya_j Editor & Promoter
Hello Akash, can you reply with the demo video link and the project presentation link?

Posted 11 months ago
Copy link
Email
Twitter
Facebook
Linkedin
- AK
  
  Akash Kamalesh
  
  @asphytheghoul Submitter
  Hello Akshobya, apologies for the late reply!
  Here is the link to our presentation : https://docs.google.com/presentation/d/1in4MhQkY6N5SnO-PJ9OVhVIe9K6jOXLRrU45GJbNPF0/edit?usp=sharing
  
  This is the link to our project video demo :
  https://drive.google.com/file/d/19YY1dBt0t29NtIZGjQsZivuKwfy9IOkC/view?usp=sharing
  
  Thank you and apologies once again for the delayed response.
  
  Posted 11 months ago
  
  Copy link
  Email
  Twitter
  Facebook
  Linkedin

A

Akshobhya

@akshobhya_j Editor & Promoter
Can you reply to this comment with the base model that you are using in this project?

Posted 1 year ago
Copy link
Email
Twitter
Facebook
Linkedin

AS

Arvind Saraf

@arvinds
Quick question (apologies, I haven't used Switch transformers yet) - MoE usually struggles with context across individual experts. If the text has mix of say Hindi & Kannada, how will the routing be handled - since different parts of the output may get tokens from differetnt LLMs. How are they combined?

Posted 1 year ago
Copy link
Email
Twitter
Facebook
Linkedin
- AK
  
  Akash Kamalesh
  
  @asphytheghoul Submitter
  Hello Arvind! This is a very interesting case and is quite probable while entering an input. There's two possible cases here that could happen, (we are still exploring about possible alternatives but this is what we have in mind). If a user types a mix in kannada and english (say), the query is converted to an embedding and outputs a probability distribution across the experts. This will involve rigorous training as it will understand the task that the user wants to perform and route it appropriately to the expert. If the input is a case of english , romanized kannada and kannada, the model will be able to handle this because the adapters are trained with such data and initial testing from our end shows plausible results in doing translation between the among 3 languages in any combination. The issue is when you combine two primary languages - Hindi and Kannada in one single prompt. We have developed a language identification model which will identify the language of each sentence and use this information to route the tokens to the appropriate adapter (for that language). So an input with sentences in a mix of languages should be handled appropriately albeit we can only comment on the same after we finish training and have our results! The problem that might arise is what if an input sentence itself has tokens with different languages in it . That is a case we are yet to decide on a method to handle and is also currently under research.
  
  Posted 1 year ago
  
  Copy link
  Email
  Twitter
  Facebook
  Linkedin

A

Akshobhya

@akshobhya_j Editor & Promoter
Congratulations, @asphytheghoul, @tanisthahota, Anirudh. This project has been shortlist as part of the Fifth Elephant Open Source AI Hackathon. You can apply to Microsoft Founders Hub for credits. Application details are explained at http://has.gy/o4RD

Do connect with your mentor @cerebraltangent to refine the idea and continue building the project.

All the best!

Posted 1 year ago
Copy link
Email
Twitter
Facebook
Linkedin

A

Akshobhya

@akshobhya_j Editor & Promoter
@asphytheghoul, @Anirudh , thank you for your proposal submission to The Fifth Elephant Open Source AI Hackathon. The proposal addresses the need for a more efficient and versatile modeling strategy for Large Language Models (LLMs) to adapt to different languages and domains. The implementation of Mixture of Experts (MoE) to create domain-adaptive pretrained models for specific languages demonstrates an innovative approach to enhancing model performance. This submission needs to be updated based on the following considerations.

Technical Suggestions
1. Base Model and Tokenization
- Utilizing the LLaMa-27b model as the base model for pre-training and customizing BPE sentencepiece tokenizers for Hindi and Kannada languages is a strategic approach.
- However, an extensive evaluation of the effectiveness of this tokenizer extension and vocabulary modification is necessary.
1. Pre-training Tasks
- The selection of machine translation, context learning, question answering, reasoning, and text classification for pre-training tasks offers a diverse set of challenges for the model.
- Ensuring a balanced distribution of resources and attention to each task will be crucial for comprehensive model learning.
1. Mixture of Experts Framework
- The incorporation of the Switch Transformers’s Routing Algorithm for MoE setup is commendable, offering a robust mechanism for assigning tokens to different experts based on language and domain.
- However, the detailed methodology for aggregation of outputs and ensuring coherence across languages and domains should be thoroughly outlined.
1. Verifying the efficiency and effectiveness of the MoE architecture specifically for LLMs and multilingual applications is crucial. Robust experimentation and comparative analysis with traditional ensemble techniques could provide valuable insights.
2. Ethical Considerations and Deployment
- Integrating measures to prevent hateful speech generation showcases a responsible and ethical stance.
- Communicating the specifics of these measures and ensuring comprehensive adherence to ethical guidelines will be crucial, especially in multi-lingual and cross-domain scenarios.
1. Detailed deployment plans onto cloud service providers, considering scalability, accessibility, and security, should be articulated to ensure a seamless transition from research to practical implementation.
Closing Thoughts

The proposal presents an ambitious and innovative approach to addressing the adaptability and performance of LLMs across diverse languages and domains. Enhancing the transparency and depth of technical methodologies, thorough validation of extensions and frameworks, and a holistic approach to ethical considerations and deployment will be pivotal in realizing the potential of this groundbreaking initiative. We look forward to witnessing the outcome of this promising endeavor.

→ Utilize the available platforms such as The Fifth Elephant WhatsApp group to engage with mentors and seek guidance on technical and implementation aspects of your project.
Posted 1 year ago
Copy link
Email
Twitter
Facebook
Linkedin

Open Source AI Hackathon 2024

Membership

Corporate Members-only benefits (bulk ticket purchase):

Multilingual Mixture of Experts for Domain Adaptive Pre-training of Large Language Models

Problem Statement

Roadmap

Key Features ✨

Github Repository

Proposed Solution (subject to changes)

Presentation and Demo Video

Comments

Akshobhya

@akshobhya_j Editor & Promoter

Akash Kamalesh

@asphytheghoul Submitter

Akshobhya

@akshobhya_j Editor & Promoter

Arvind Saraf

@arvinds

Akash Kamalesh

@asphytheghoul Submitter

Akshobhya

@akshobhya_j Editor & Promoter

Akshobhya

@akshobhya_j Editor & Promoter

Technical Suggestions

Closing Thoughts