CLARIN Café: 'Do Chatbots Dream of Copyright?' Copyright in AI-generated Language Data

, -

General Information

This CLARIN Café is organised by Paweł Kamocki (CLIC) in collaboration with CLARIN. The CLARIN host will be Antal van den Bosch.

Date: 11 April 2023
Time: 14:00-16:00 CEST
Location: CLARIN Virtual Zoom meeting

A full overview of the Café sessions scheduled can be found on the CLARIN Café page


The release of ChatGPT, Bing Chat, and, more recently, GPT-4, has brought large language models and generative chatbots into the public eye. While these developments may not be surprising to most CLARIN researchers, there is an increased interest in AI-generated language data and human-machine interaction corpora. The exploration of generative chatbots provides endless opportunities for research, with new and surprising use cases emerging nearly every day. Many are left wondering about the legal implications of these developments.

This CLARIN café will focus specifically on the question of whether AI-generated texts can—and should—be protected by copyright. We invite you to join us and our distinguished guests for a productive discussion!

This text has not been generated by a chatbot.

How to join

Please register for free using this link in order to receive the meeting room details.


A computational linguist by training, Antal van den Bosch has worked in text mining, digital humanities, and on applications of computational language modelling in (psycho, socio and neuro) linguistics. He has led efforts to create sustainable and open source software packages for machine learning and (Dutch) , much of this in the context of CLARIN-NL and CLARIAH-NL. He is Professor of Language, Communication and Computation at the Faculty of Humanities of Utrecht University. He is guest professor at the Computational Linguistics and Psycholinguistics Research Center at the University of Antwerp, Belgium, a member of the Netherlands Royal Academy of Arts and Sciences, and fellow of the European Association for Artificial Intelligence.


Paweł Kamocki is a researcher at the Leibniz-Institut für Deutsche Sprache and Chair of the CLARIN Legal and Ethical Issues Committee. He holds a Doctor of Law (Dr. iur.) degree (Münster, Paris), as well as a Master’s degree in linguistics (Warsaw), and graduated from the Paris Barrister Training School. His scientific interests are centered around legal issues affecting data-intensive science, especially in the field of linguistics and Digital Humanities; he published a number of peer-reviewed articles and book chapters on these questions. He also co-chaired the Working group on Data Access and Re-Use Policies and helped develop such Legal Tech tools as the Public License Selector and the DARIAH ELDAH Consent Form Wizard.


Thomas Margoni is Research Professor of intellectual property law at the Faculty of Law, KU Leuven and a member of the Board of Directors of the Centre for IT & IP Law (CiTiP). His research concentrates on the relationship between law and new technologies with particular attention to the role of the Internet and more recently of AI as new forms to create, transform and disseminate knowledge and information. Current examples of research projects include reCreating Europe the EU H2020 funded project developing an integrated policy approach to copyright in the EU digital single market, where Thomas leads the task on AI and data ownership; OpenAIRE the H2020 project developing an Open Science e-infrastructure for Europe, where Thomas is joint coordinator of the legal and policy task force; OpenMinTeD, the now completed EU H2020 project for the development of an e-infrastructure for Text and Data Mining (TDM) in Europe where Thomas coordinated the legal working group. Other areas of interest where Thomas has developed institutional as well as funded research include the processes of EU copyright and design law harmonisation; data ownership and AI; copyright, design rights and additive manufacturing; the digitisation of cultural heritage and the digital public domain; open access and open science; online intermediaries, fundamental rights and the platform economy; and the role of property rights in sports.

Toby Bond is a partner in Bird & Bird’s Intellectual Property Group, based in London.  Much of his work focuses on helping clients navigate issues relating to the protection and commercialisation of data as they take advantage of the power of big data analytics and artificial intelligence.  He has a particular interest in the wider intellectual property issues arising from the development and deployment of AI systems and has been recognised by The Legal 500 as providing cutting-edge advice on copyright and the protection of AI generated works.  In 2021 he was named one of Global Data Review’s worldwide ‘40 under 40’ upcoming data lawyers.



14:00 - 14:05 Opening and CLARIN 1-0-1 - Antal van den Bosch, Member of the CLARIN Board of Directors

14.05 - 14.35 Of course. But… maybe? - Paweł Kamocki, IDS Mannheim

14.35 - 14.55 The model doesn’t fall far from the data - Thomas Margoni, KU Leuven

14.55 - 15.05 The UK did it before it was cool - Toby Bond, Bird&Bird

15.05 - 16.00 Questions and discussion

Moderator: Antal van den Bosch

Discussants: Fabian Ferrari (Utrecht University), Francesca Frontini (CLARIN )

Recordings, Slides and Impact Story