The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Kiran Chandra Yarlagadda

AI--By the People For the People

Submitted May 23, 2024

Introduction

Like steam engine, Electricity and Internet became integral to the first, second and third Industrial revolutions, AI is going to be adapted sooner than later in all production and business processes. It is very rapidly going to change the way people conduct their businesses and how the production processes are executed.
While the Free/open and the proprietary/closed nature of the source code in the third industrial has been resolved with Free/Open model taking the centre stage and fostered the faster penetration of IT and its adaption.
In world of AI, source code is only one of the four components among the algorithm, the source code (or code), the data, and the model. Non exclusivist aspects of algorithm, Openness of algorithmic implementation ie verifiability of the model to construe to the algorithm, diversity of data sets become crucial.
Data is the new oil apparently. And just as with oil, we are seeing lots of techniques being employed to collect this data. Are these datasets being collected with permissions from the original creators. Does the voice of the creator matter. As we see in the current fiasco of the Scarlett Johansson voice in OpenAI demo, rights matter. Training this data needs humungous amounts of compute, by some estimates, the power needed to train these models is as much as that needed to run a city.
This talk will focus on the tryst with an alternative path of making Datasets Collaboratively, building models responsibly and ensuring that the licenses of the datasets protect the datasets in the interest of the commons.

Target Audience
The primary audience for this talk includes anyone interested in AI who want to solve problems in collaboration - AI researchers, developers, engineers, product and business leaders.

Outline

Data the new Oil and with the Digital Monopolies or Governments
◦ Data Harvesting
◦ Privacy issues
• The alternative
◦ Datathons
▪ Swecha Chandamama Kathalu
▪ Voice samples for ASR
▪ 100000 internships
◦ The Compute
◦ The Model
• Prime issue of licensing
◦ Source Code
◦ Model
◦ Datasets
• Licensing of Data Sets
◦ The issue with the current licenses
◦ The alternative license

Key Takeaways

A working example of collecting data responsibly and building models in the open
A direction to think in terms of licenses applicable for AI datasets and AI models
How to build an AI for the people, by the people.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.