🚀 Showcase your Open Source AI work!
Get expert feedback & present at the OSAI meet-ups and festivals in 2025!
Harini Anand
@inirah02
Submitted Mar 22, 2025
This is an enhanced version of a talk I previously delivered at a workshop organized by Google Developer Student Clubs. The presentation has been refined to highlight the open-source and multimodal capabilities of transformer-based models, making it suitable for submission to the Open Source AI community. The talk explores how BERT-like architectures have evolved into multimodal foundation models such as CLIP, Flamingo, and Gemini. It delves into how BERT laid the groundwork for multimodal transformers by inspiring models that handle images, audio, and video alongside text.
The session covers key topics including the evolution of BERT-like architectures, cross-modal transfer learning, and real-time multimodal processing. It discusses the technical depth of these models, including their novelty, feasibility, and implementation details. The presentation also shares real-world effectiveness through demonstrable success metrics and lessons learned from practical applications. A link to the presentation slides is available here: https://docs.google.com/presentation/d/1coD93qVDXJJAMfaYzi7UYTgAe3plNnmh/edit?usp=sharing&ouid=114040809473801015218&rtpof=true&sd=true
This session is beneficial for AI enthusiasts, researchers, and developers interested in multimodal processing and transformer-based architectures. It is particularly suited for those looking to leverage open-source models for innovative applications across various domains, including but not limited to computer vision, natural language processing, and audio analysis. The content will be accessible to both beginners seeking foundational knowledge and advanced practitioners looking to enhance their understanding of the latest advancements in multimodal AI.
A link to the presentation slides I made for the talk proposal is available here: https://docs.google.com/presentation/d/1coD93qVDXJJAMfaYzi7UYTgAe3plNnmh/edit?usp=sharing&ouid=114040809473801015218&rtpof=true&sd=true
This talk aims to provide a comprehensive overview of the evolution and applications of multimodal transformers, emphasizing their open-source and real-world potential. By sharing lessons learned and future directions, it seeks to inspire further innovation in the field of AI and multimodal processing.
I am Harini Anand, a Final Year CSE Undergrad and a Gen AI Teaching Assistant.
I currently work as a Software Developer Intern at IBM at the Data & AI Division, on IBM watsonx™, which is our portfolio of AI products that accelerates the impact of generative AI in core workflows to drive productivity.
I previously worked at Niramai Health Analytix, a deep tech startup focusing on Breast Cancer and used ML frameworks as a Research Intern at Indian Institute of Technology Hyderabad, to predict gene regulatory networks. I have also built cognitive tools for reducing the onset of Dementia as a student entrepreneur. I led the largest technical community on campus, also held workshops and given talks on Machine Learning, NLP, and Data Science.
I’ve been admitted to top summer schools at Oxford University, London and Massachusetts Institute of Technology which specialize in the applications of AI in Healthcare. I am Google KaggleX Mentee, and an AWS Scholar and have been awarded merit scholarships. I am a strong advocate for representation in STEM, and recognized as a High Impact APAC Ambassador for the Women In Data Science Community, an initiative by Stanford University. I am also a Harvard WE Tech Fellow.
Hosted by
Supported by
Community Partner
Community sponsor
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}