Dec 2023

18 Mon

19 Tue 05:30 PM – 06:30 PM IST

20 Wed

21 Thu

22 Fri

23 Sat

24 Sun

Jan 2024

1 Mon

2 Tue

3 Wed

4 Thu

5 Fri 05:30 PM – 07:20 PM IST

6 Sat

7 Sun

Jan 2024

8 Mon 06:00 PM – 06:55 PM IST

9 Tue

10 Wed 06:00 PM – 07:00 PM IST

11 Thu

12 Fri 06:00 PM – 07:30 PM IST

13 Sat 03:00 PM – 06:00 PM IST

14 Sun

Jan 2024

22 Mon

23 Tue

24 Wed

25 Thu

26 Fri

27 Sat 05:00 PM – 05:45 PM IST

28 Sun

Feb 2024

29 Mon

30 Tue

31 Wed

1 Thu

2 Fri

3 Sat 10:00 AM – 06:25 PM IST

4 Sun

Feb 2024

5 Mon

6 Tue

7 Wed 08:15 PM – 09:00 PM IST

8 Thu

9 Fri

10 Sat

11 Sun

Feb 2024

12 Mon 08:15 PM – 09:00 PM IST

13 Tue 08:15 PM – 09:00 PM IST

14 Wed 08:15 PM – 09:00 PM IST

15 Thu 08:15 PM – 09:00 PM IST

16 Fri 07:30 PM – 08:30 PM IST

17 Sat 08:15 PM – 09:00 PM IST

18 Sun

Feb 2024

19 Mon

20 Tue

21 Wed 08:30 PM – 09:15 PM IST

22 Thu

23 Fri

24 Sat

25 Sun

Mar 2024

4 Mon

5 Tue

6 Wed

7 Thu

8 Fri

9 Sat 07:00 PM – 09:00 PM IST

10 Sun 04:00 PM – 06:00 PM IST

Apr 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri 12:00 PM – 06:25 PM IST

13 Sat

14 Sun

Hasura, Bangalore

Tickets

All submissions

This submission has been added to the schedule

Powered by VideoKen

Session video

Submission video

Synapse.AI - Bridging the Gap between Deaf and Non-Deaf Community based on Indian Sign Language(ISL)

Submitted Feb 2, 2024

Category: AI for accessibility

Synapse ai

Shruti-Drishti: Bridging the Communication Gap for the Deaf Community in India 🌉🇮🇳

Introduction 🙌

Shruti-Drishti is an innovative project aimed at addressing the communication gap between the deaf and non-deaf communities in South Asia, particularly in India. By leveraging deep learning models and state-of-the-art techniques, we strive to facilitate seamless communication and promote inclusivity for individuals with hearing impairments. 🌟

Our webapp here aims to bridge the communication gap between the Deaf and Non-Deaf Community based on our LSTM and Transformer model on the sign langauge video keypoints.

Our aim is to improve the quality of communication by providing accurate and reliable translations.

We provide two Services- (1) Real Time Sign Language to Text
(2) Text to Sign Language Translation.

This is the repo: https://github.com/pranjalkar99/shruti-drishti
(Note: The repo is being updated with the latest changes and work done so far.)

DEMO VIDEO

Demo for ISL based Sign Language Detection

Key Features ✨

Sign Language to Text Conversion 🖐️➡️📝: Our custom Transformer-based Multi-Headed Attention Encoder, powered by Google’s Tensorflow Mediapipe, accurately converts sign language videos into text, overcoming challenges related to dynamic sign similarity.
Text to Sign Language Generation 📝➡️🖐️: Utilizing an Agentic LLM framework, Shruti-Drishti converts textual information into masked keypoints based sign language videos, tailored specifically for Indian Sign Language.

Text2sign

Multilingual Support 🌐: Our app uses IndicTrans2 for multilingual support for all 22 scheduled Indian Languages. Accessibility is our top priority, and we make sure that everyone is included.
Content Accessibility 📰🎥: Shruti-Drishti enables news channels and content creators to expand their user base by making their content accessible and inclusive through embedded sign language video layouts.

Dataset Details 📊

Link to the Dataset: INCLUDE Dataset

The INCLUDE dataset, sourced from AI4Bharat, forms the foundation of our project. It consists of 4,292 videos, with 3,475 videos used for training and 817 videos for testing. Each video captures a single Indian Sign Language (ISL) sign performed by deaf students from St. Louis School for the Deaf, Adyar, Chennai.

Model Architecture 🧠

Shruti-Drishti employs two distinct models for real-time Sign Language Detection:

LSTM-based Model 📈: Leveraging keypoints extracted from Mediapipe for poses, this model utilizes a recurrent neural network (RNN) and Long-Short Term Memory Cells for evaluation.
- Time distributed layers: Extract features from each frame based on the Mediapipe keypoints. These features capture spatial relationships between joints or movement patterns.
- Sequential Layers: Allows the model to exploit the temporal nature of the pose data, leading to more accurate pose estimation across a video sequence.
Transformer-based Model 🔄: Trained through extensive experimentation and hyperparameter tuning, this model offers enhanced performance and adaptability.
- Training Strategies:
  1. Warmup: Gradually increases the learning rate from a very low value to the main training rate, helping the model converge on a good starting point in the parameter space before fine-tuning with higher learning rates.
  2. AdamW: An advanced optimizer algorithm that addresses some shortcomings of the traditional Adam optimizer and often leads to faster convergence and improved performance.
  3. ReduceLRonPlateau: Monitors a specific metric during training and reduces the learning rate if the metric stops improving for a certain number of epochs, preventing overfitting and allowing the model to refine its parameters.
  4. Finetuned VideoMAE: Utilizes the pre-trained weights from VideoMAE as a strong starting point and allows the model to specialize in recognizing human poses within videos.

We have also implemented the VideoMAE model, proposed in the paper “VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training.” Fine-tuning techniques such as qLORA, peft, head and backbone fine-tuning, and only head fine-tuning were explored, with the latter proving to be the most successful approach.

Solution Approach 🎯

Shruti-Drishti tackles the communication gap through a two-fold approach:

Sign Language to Text: Implementing a custom Transformer-based Multi-Headed Attention Encoder using Google’s Tensorflow Mediapipe, we convert sign language videos into text while addressing challenges related to dynamic sign similarity.
Text to Sign Language: Utilizing an Agentic LLM framework, Shruti-Drishti converts textual information into masked keypoints based sign language videos, tailored specifically for Indian Sign Language.

Action Plans 📋

Pose-to-Text Implementation: Develop and implement a Pose-to-Text model based on the referenced paper for the Indian Sign Language dataset, using Agentic langchain based state flow as the decoder stage for text-to-gloss conversion and merging masked keypoint videos.
Custom Transformer Model Evaluation: Assess the effectiveness of our custom Transformer/LSTM model on the Sign Language Dataset, focusing on accuracy and adaptability to dynamic signs.
Multilingual App Development: Create a user-friendly multilingual app serving as an interface for our Sign Language Translation services, ensuring easy interaction and adoption by both deaf and non-deaf users.

UseCases

Workplace and Educational Inclusion:
- Deploy the Sign Language Generation system in offices and educational institutions to facilitate seamless communication with the deaf and mute community.
- Empower individuals with hearing impairments by providing them with equal opportunities for education and employment.
Content Accessibility:
- Enable news channels and content creators to expand their user base by making their content accessible and inclusive.
- Offer services to embed sign language video layouts for content, fostering a more inclusive society and promoting equal participation.

Progress So Far ✅

Basic Deep Learning-based LSTM model for sign language recognition (Done)
Custom multi-headed attention-based encoder for sign language recognition for dynamic signs (Done)
Testing on the whole Indian dataset for our attention model (Done)
Implementing the pose-to-text using agentic framework (Langgraph) (Done)
Build multilingual app (Done)
Build Demo and update repo (In Progress)

Results 📈

Transformers

Results Image

For detailed results and insights, please refer to our presentation slides.

LSTM

(TODO)

Other Links 🔗

Project Contributors 👥

All submissions

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}

{{ gettext('New comment') }}

{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Dec 2023

18 Mon

19 Tue 05:30 PM – 06:30 PM IST

20 Wed

21 Thu

22 Fri

23 Sat

24 Sun

Jan 2024

1 Mon

2 Tue

3 Wed

4 Thu

5 Fri 05:30 PM – 07:20 PM IST

6 Sat

7 Sun

Jan 2024

8 Mon 06:00 PM – 06:55 PM IST

9 Tue

10 Wed 06:00 PM – 07:00 PM IST

11 Thu

12 Fri 06:00 PM – 07:30 PM IST

13 Sat 03:00 PM – 06:00 PM IST

14 Sun

Jan 2024

22 Mon

23 Tue

24 Wed

25 Thu

26 Fri

27 Sat 05:00 PM – 05:45 PM IST

28 Sun

Feb 2024

29 Mon

30 Tue

31 Wed

1 Thu

2 Fri

3 Sat 10:00 AM – 06:25 PM IST

4 Sun

Feb 2024

5 Mon

6 Tue

7 Wed 08:15 PM – 09:00 PM IST

8 Thu

9 Fri

10 Sat

11 Sun

Feb 2024

12 Mon 08:15 PM – 09:00 PM IST

13 Tue 08:15 PM – 09:00 PM IST

14 Wed 08:15 PM – 09:00 PM IST

15 Thu 08:15 PM – 09:00 PM IST

16 Fri 07:30 PM – 08:30 PM IST

17 Sat 08:15 PM – 09:00 PM IST

18 Sun

Feb 2024

19 Mon

20 Tue

21 Wed 08:30 PM – 09:15 PM IST

22 Thu

23 Fri

24 Sat

25 Sun

Mar 2024

4 Mon

5 Tue

6 Wed

7 Thu

8 Fri

9 Sat 07:00 PM – 09:00 PM IST

10 Sun 04:00 PM – 06:00 PM IST

Apr 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri 12:00 PM – 06:25 PM IST

13 Sat

14 Sun

Hybrid access (members only)

Hosted by

Hack Five

Hack Five

The Fifth Elephant hackathons

Supported by

Host

The Fifth Elephant

The Fifth Elephant

Jumpstart better data engineering and AI futures

Meta

Meta

Venue host

Hasura

Welcome to the events page for events hosted at The Terrace @ Hasura. more

Partner

Microsoft for Startup's

Microsoft for Startup's

Providing all founders, at any stage, with free resources to build a successful startup.