The Fifth Elephant 2024 Annual Conference (12th &13th July)
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Kiran Chandra Yarlagadda
Introduction
Like steam engine, Electricity and Internet became integral to the first, second and third Industrial revolutions, AI is going to be adapted sooner than later in all production and business processes. It is very rapidly going to change the way people conduct their businesses and how the production processes are executed.
While the Free/open and the proprietary/closed nature of the source code in the third industrial has been resolved with Free/Open model taking the centre stage and fostered the faster penetration of IT and its adaption.
In world of AI, source code is only one of the four components among the algorithm, the source code (or code), the data, and the model. Non exclusivist aspects of algorithm, Openness of algorithmic implementation ie verifiability of the model to construe to the algorithm, diversity of data sets become crucial.
Data is the new oil apparently. And just as with oil, we are seeing lots of techniques being employed to collect this data. Are these datasets being collected with permissions from the original creators. Does the voice of the creator matter. As we see in the current fiasco of the Scarlett Johansson voice in OpenAI demo, rights matter. Training this data needs humungous amounts of compute, by some estimates, the power needed to train these models is as much as that needed to run a city.
This talk will focus on the tryst with an alternative path of making Datasets Collaboratively, building models responsibly and ensuring that the licenses of the datasets protect the datasets in the interest of the commons.
Target Audience
The primary audience for this talk includes anyone interested in AI who want to solve problems in collaboration - AI researchers, developers, engineers, product and business leaders.
Data the new Oil and with the Digital Monopolies or Governments
◦ Data Harvesting
◦ Privacy issues
• The alternative
◦ Datathons
▪ Swecha Chandamama Kathalu
▪ Voice samples for ASR
▪ 100000 internships
◦ The Compute
◦ The Model
• Prime issue of licensing
◦ Source Code
◦ Model
◦ Datasets
• Licensing of Data Sets
◦ The issue with the current licenses
◦ The alternative license
A working example of collecting data responsibly and building models in the open
A direction to think in terms of licenses applicable for AI datasets and AI models
How to build an AI for the people, by the people.
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Hosted by
Supported by
Gold Sponsor
Sponsor
Community Partner
Beverage Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}