CLARIN Café on Text and Data Mining Exceptions in the Directive on Copyright in the Digital Single Market

, -

General Information

This edition of the CLARIN Café is organised by Paweł Kamocki and Vanessa Hannesschläger, chair and vice-chair of the CLARIN Legal and Ethical Issues Committee (CLIC). 

Date: Thursday 28 October 2021

Time: 14:00 - 16:00 (CEST)

Venue: CLARIN virtual Zoom meeting 

A full overview of the Café sessions scheduled can be found on the CLARIN Café page.


Language researchers have to deal with a number of legal topics in order to make sure their research does not violate the rights of other stakeholders. One of the most important topics in this context is copyright legislation. The Directive on Copyright in the Digital Single Market (the DSM Directive) was therefore awaited with great interest by the CLARIN community. It contains several provisions that can impact language resources and research infrastructures, including – probably most importantly statutory exceptions for Text and Data Mining (TDM). The ‘right to mine’, i.e. the possibility to analyse copyright-protected material using digital methods without having to ask right holders for permission (and, usually, pay for it), is something that the scientific community in general, and the CLARIN community in particular, fought for for over a decade. The relevant articles of the DSM Directive are a product of many years of legislative developments, with national exceptions adopted in the UK, France and Germany (in 2014, 2016 and 2017, respectively), with limited success. The proponents of broad TDM exceptions often highlighted that, compared to their US counterparts, who are free to mine copyright-protected content under the flexible 'fair use' doctrine, European researchers were put at a serious disadvantage.

Today, many EU countries have already transposed the DSM Directive in their national legal systems; in other countries the transposition is still ongoing, which is understandable in the socio-economic context shaped by the COVID-19 pandemic. It is therefore timely to discuss whether the TDM exceptions in the DSM Directive meet the expectations of the language research community.

In this edition of the CLARIN Café, organised by the CLARIN Committee on Legal and Ethical Issues, legal experts and seasoned practitioners from the CLARIN community will explain the content of the TDM exceptions, try to provide insights into their rationale, and analyse their impact on language resources and research infrastructures. 

Together with our audience, we will try to answer the question of whether we are really witnessing ‘the end of history’ when it comes to intellectual property issues in language research.

How to Join

Please register for free using this link in order to receive the meeting room details.


14:00 - 14:05 Opening and CLARIN 101 - Francesca Frontini (ILC-CNR and CLARIN )

14.05 - 14.35 TDM exceptions in the DSM Directive and how we got there

Dr. iur. Paweł Kamocki, IDS Mannheim, chair of the CLARIN Legal and Ethical Issues Committee

14.35 - 15.05  Challenges in transposing the TDM exceptions: national perspectives from selected Member States

Prof. Dr. Aleksei Kelli, University of Tartu, member of the CLARIN Legal and Ethical Issues Committee

15.05 - 15.35 TDM exceptions and language resources: some pressing issues from Czechia

Prof. Dr. Jan Hajič, Charles University Prague, member of the Standing Committee for CLARIN Technical Centres

15.35 - 16.00 Questions and discussion

Moderator: Dr. Vanessa Hannesschläger, DLA Marbach, vice chair of the CLARIN Legal and Ethical Issues Committee



Paweł Kamocki is a legal expert in Leibniz-Institut für Deutsche Sprache, Mannheim. He studied linguistics and law, and in 2017 obtained his doctorate in law from the universities of Paris and Münster for a thesis on legal aspects of data-intensive university research, with a focus on Knowledge Commons. He worked as a researcher at the Faculty of law of the Paris Descartes university (now: Université de Paris), then also in the private sector. He is certified to work as an attorney in France. An active member of the CLARIN community since 2012, he currently chairs the CLARIN Legal and Ethical Issues Committee. He also worked with other projects and initiatives in the field of research policy and co-created LegalTech tools for researchers. One of his main research interests are legal issues in Machine Translation.

Aleksei Kelli is a Professor of Intellectual Property Law at the University of Tartu, Estonia. He is a member of the court of honour of the Estonian Bar Association and CLARIN Legal and Ethical Issues Committee. Dr Kelli is involved in managing intellectual property rights and personal data at the Center of Estonian Language Resources. Aleksei holds a doctorate (PhD in Law) from the University of Tartu (2009). Aleksei has acted as the Head of an Expert Group on the Codification of the Intellectual Property Law (2012-2014, the Ministry of Justice of Estonia). He was the principal investigator in the Programme for Addressing Socio-economic Challenges of Sectoral R&D in the field of industrial property (2017-2018) and open science (2016-2017). Dr Kelli managed a project to improve industry-academia cooperation and knowledge transfer in Ukraine (2015-2016) and was the leading intellectual property expert in the research and innovation policy monitoring programme (2011-2015). Dr Kelli was also a Member of the Team of Specialists on Intellectual Property (2010-2013, the United Nations Economic Commission for Europe). He has taken part in several EU and Estonian R&D projects as a leading IP, innovation, and data protection expert. Dr Kelli has published numerous works on intellectual property, innovation, personal data protection, knowledge transfer, cultural heritage and related issues. 

Jan Hajič is a full professor of Computational Linguistics at the Institute of Formal and Applied Linguistics at the School of Computer Science, Charles University in Prague, where he has also received his Ph.D. in 1995. He served as the head and deputy head of the Institute between 2001 and 2020. His interests cover morphology and part-of-speech tagging of inflective languages, machine translation, deep language understanding, and the application of statistical methods in natural language processing in general. He also has an extensive experience in building language resources for multiple languages with rich linguistic annotation, and is currently the director of a large, multi-institutional research infrastructure on language resources in the Czech Republic, LINDAT/CLARIAH-CZ, which aims at making datasets and corpora openly available for linguistic and Digital Humanities research. His work experience includes both industrial research (IBM Research Yorktown Heights, NY, USA, in 1991-1993) and academia (Charles University in Prague, Czech Republic and Johns Hopkins University, Baltimore, MD, USA, 1999-2000, adjunct position at University of Colorado, USA, 2017-2022). He has published more than 200 conference and journal papers, a book on computational morphology, and several other book chapters, encyclopedia and handbook entries. He regularly teaches basic and advanced courses on Statistical and has multiple experience giving tutorials and lectures at various international training schools. He has been the PI or Co-PI of numerous international as well as large national grants and projects (including EU Framework Programme projects, such as H2020, and the NSF ITR program in the U.S.). He is the chair of the Executive Board of META-NET, European research network in Language Technology. 

Recordings, Slides and Blog