At the open-source AI community kickoff on 22 November, two concerns were raised and discussed:
- What is the incentive to develop open source AI models?
- Is there too much monopoly by Meta, or corporations like Meta, on open source AI?
📅 Friday, 22nd Nov, from 4:00 PM to 5:30PM 📅
📍**@ Rootconf, BIC, Indiranagar, Bengaluru** 📍
-
Limited availability of fully open-source AI models: The lack of fully open-source AI models is a major hurdle for independent developers. For a model to be considered truly open-source, it must meet several criteria, including open data (access to the original sources), open code, and open weights/parameters. However, very few AI models meet these standards, hindering developers’ ability to replicate, modify, and build upon them.
-
Data issues in Indic LLMs due to common use of multiple languages: Speaker Akash Paul raised the issue of the gap between the ideal and reality in supporting regional languages for AI applications in India. Many datasets are “too pure,” focusing on standardized forms of languages like Hindi, Tamil, and Bengali, while real-world data is often messy and noisy as people tend to use multiple languages in most interactions. Akash has observed that “reality is always mixed. It is never pure.” Product development needs this kind of mixed dataset, and not necessarily pure Hindi/Indic language datasets. The speakers suggested a more systematic approach to collecting real-world data, potentially through volunteer or paid contributions, to better support multi-language language models in a variety of contexts and applications.
-
Need for caapcity in the community to enable more community-driven participation in creating truly open source AI projects: Venkata Pingali, speaking at the discussion, pointed that capacity has to built for community development and adoption of open source AI projects. Using Sarvam’s example, Pingali pointed out that for most open source AI projects, neither is the need for compute very high, nor do such projects need very sophisticated hardware to build applications on top of LLMs and AI models. Quoting Pingali,
So that the hardware is not the limitation today, the cost is not the limitation. The willingness, the ability of the community to come together and put together applications, is the limitation, and that requires vision and a concerted effort.
This point was also made by Chaitanya Chokkareddy when explaining Swecha’s work in building an open source Telugu language model. (Watch the talk by Kiranchandray on Swecha’s efforts in collecting data for building Telugu LLM - https://hasgeek.com/fifthelephant/2024/sub/ai-by-the-people-for-the-people-BScezALTnRdopfbczjfbD3 and the subsequent discussion led by Chaitanya Chokkareddy on the need for a new licensing framework for open source LLMs - https://hasgeek.com/fifthelephant/2024/sub/need-for-new-licenses-in-this-age-of-generative-ai-MJkJFvbCjd4dzsB9KhnBfQ)
- What is the incentive to develop open source AI models? One benefit is clearly that open source means more eyes watching over glitches and challenges. As Unnati - speaker at the discussion - mentioned,
... mistakes have a higher chance of being caught when done with open source”. Closed source means fewer people watching and maintaining.
Pingali’s response was also useful - that unless there is more investment and initiative for upskilling, there is no incentive to build for open.
Then you have situations where companies like OpenAI will, in future, charge a tax on every transaction, for every usage.
Pingali urged that companies such as Flipkart and PhonePe should invest in the community for upskilling the community because AI will become all pervasive in domains such as fintech. Building capacity is very important. In the absence of capacity, an AI tax is most likely the possible scenario, he opined.
-
Is there currently too much of a monopoly by Meta and other corporations on open source AI?
Monopoly over data sources is a bigger concern to many engineers than Meta’s monopoly over LLMs themselves. Unnati pointed that even with data protection laws, companies like Meta may be getting away with things that are not permissible by such laws.
-
Need for a leader board to rate LLMs - Akash Paul suggested that we need to understand the differences between LLMs from different companies better, so engineers and organizations can make better choices at large. These LLMs should be evaluated for their accuracy, use cases, etc. Currently, we don’t have any such mechanisms for audits and rating, he pointed.
This meet-up highlighted several critical challenges and opportunities for the development of AI in India, particularly in terms of accessibility, inclusivity, and the alignment of incentives and funding for open source AI projects. Key issues include the scarcity of fully open-source models and the need for more diverse, representative, and truly open data. Additionally, the importance of community-driven efforts, open hardware, and building technical expertise were emphasized as essential to scaling AI solutions effectively.
Anwesha Sen (Assistant Programme Manager at The Takshashila Institution) moderateed the kick-off discussion. Speakers included Akash Paul (Open Source AI enthusiast; former senior ML Engineer at Airtel), Bharat Shetty (AI Consultant), Unnati (AI/ML engineer) and Dr. Venkata Pingali (Scribble Data).
- Bharat Shetty on ideas/projects he is building on accessibility.
- Dr. Venkata Pingali on Indic approach to agents.
We’re community of practitioners -- from startups and enterprises to builders and tinkerers -- who are using Open Source AI in day-to-day practice and building.
Our tribe includes:
- Individual builders - developers who are pursuing hobby projects and tech/product ideas - who are experimenting with open source AI to create products, and who are navigating the ecosystem from the point of view of regulations, uncertainty and not knowing what they have control over, and how much.
- Individuals - such as Bharat Shetty and Gopi Kumar Sasi - who have ideas on how to use Open Source and AI technologies to solve for accessibility.
- Ecosystem builders who have ideas/projects that are running, and they are looking for contributors and volunteers for open source AI projects.
The group is currently active at https://chat.whatsapp.com/BGf813RGrGM3t2c9yiZ8Z6