Kiran Chandra Yarlagadda

AI--By the People For the People

Submitted May 23, 2024


Like steam engine, Electricity and Internet became integral to the first, second and third Industrial revolutions, AI is going to be adapted sooner than later in all production and business processes. It is very rapidly going to change the way people conduct their businesses and how the production processes are executed.
While the Free/open and the proprietary/closed nature of the source code in the third industrial has been resolved with Free/Open model taking the centre stage and fostered the faster penetration of IT and its adaption.
In world of AI, source code is only one of the four components among the algorithm, the source code (or code), the data, and the model. Non exclusivist aspects of algorithm, Openness of algorithmic implementation ie verifiability of the model to construe to the algorithm, diversity of data sets become crucial.
Data is the new oil apparently. And just as with oil, we are seeing lots of techniques being employed to collect this data. Are these datasets being collected with permissions from the original creators. Does the voice of the creator matter. As we see in the current fiasco of the Scarlett Johansson voice in OpenAI demo, rights matter. Training this data needs humungous amounts of compute, by some estimates, the power needed to train these models is as much as that needed to run a city.
This talk will focus on the tryst with an alternative path of making Datasets Collaboratively, building models responsibly and ensuring that the licenses of the datasets protect the datasets in the interest of the commons.

Target Audience
The primary audience for this talk includes anyone interested in AI who want to solve problems in collaboration - AI researchers, developers, engineers, product and business leaders.


Data the new Oil and with the Digital Monopolies or Governments
◦ Data Harvesting
◦ Privacy issues
• The alternative
◦ Datathons
▪ Swecha Chandamama Kathalu
▪ Voice samples for ASR
▪ 100000 internships
◦ The Compute
◦ The Model
• Prime issue of licensing
◦ Source Code
◦ Model
◦ Datasets
• Licensing of Data Sets
◦ The issue with the current licenses
◦ The alternative license

Key Takeaways

A working example of collecting data responsibly and building models in the open
A direction to think in terms of licenses applicable for AI datasets and AI models
How to build an AI for the people, by the people.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

All about data science and machine learning

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor