Need for new licenses in this age of Generative AI

Submitted Jun 1, 2024


In this rapidly evolving digital era, data acts as the fuel powering the relentless growth of artificial intelligence. As we stand on the brink of technological revolutions, it becomes crucial to understand not just how data drives AI, but also the ethical and legal frameworks that must evolve with it. We should try to look at licensing as a tool to make sure that we can level the playing field where currently because of the access to data and compute the incumbents are reaping most of the benefits of AI.

In this talk we will walk through our journey of trying to find the right license for the crowd sourced data collected for the Telugu ASR system and the model built using the data.

Target Audience

The primary audience for this talk involves Data Scientists and AI Researchers, Legal Professionals with a Focus on Technoloy, Open Source and Community Contributors, Policy Makers and Regulators.
Academics and Students in Technology and Law Fields, Tech Entrepreneurs and Start-up Founders can also benefit from this talk.


  • Introduction to the Project

    • Overview of the Telugu ASR system
    • Importance of crowdsourced data in ASR technologies
  • Challenges with Licensing Crowdsourced Data

    • Legal complexities of using crowdsourced data
    • Ethical considerations in data collection and usage
  • Requirements for an Effective License

    • Compliance with data protection regulations (e.g., GDPR, CCPA)
    • Flexibility to accommodate contributions from a diverse crowd
    • Clarity on data usage rights and restrictions
  • Journey to Finding the Right License

    • Evaluation of existing licenses (e.g., Creative Commons, MIT, proprietary licenses)
    • Customizing license elements to suit specific needs of datasets for AI which will take care of new terms like fine tuning, model weights etc
    • Engagement with legal experts and the community
  • Next Steps and Future Directions

    • Open house for consultations
    • Finalizing the license and release
  • Q&A

    • Open floor for questions and further discussion


Introducing a specialized license for crowdsourced data, akin to the impact of the GPL for open-source software, could fundamentally transform how data is utilized in technological innovations. It would promote a collaborative environment where data can be freely shared and enhanced, while ensuring compliance with ethical standards and data protection laws. Such a license would encourage broader participation and innovation, reduce legal barriers, and ensure the sustainability of data resources. It might also help level the playing field by making sure the benefits dont accrue to only the mega corporations in AI. By clarifying usage rights and responsibilities, this new licensing framework could set industry standards for data handling, leading to more responsible and impactful technological advancements across various sectors.


