The CLARIN Café on Text and Data Mining (TDM) Exceptions in the Directive on Copyright in the Digital Single Market (DSM Directive) took place via Zoom on 28th October 2021 and was organized by the CLARIN Legal and Ethical Issues Committee (CLIC). It was attended by 25 participants, including language researchers and legal experts from both CLARIN institutions and the private sector. The aim of the event was to discuss the recent reform of EU copyright law, and its impact on language resources and language technology in selected EU Member States. It featured presentations by Paweł Kamocki, chair of CLIC and legal expert from IDS Mannheim, and Jan Hajič, professor of computational linguistics and experienced language data manager from Charles University Prague. Three members of CLIC: Walter Scholger (Austria), Giuseppe Versaci (Italy) and Mateja Jemec Tomazin (Slovenia) also provided brief contributions about the situation in their countries. The event was moderated by Vanessa Hannesschläger, vice chair of CLIC.
The DSM Directive and its Exceptions
The DSM Directive was adopted in 2019, and its transposition deadline was 7 June 2021; many EU Member States are still in the process of transposing it. The text contains two exceptions from copyright and related rights for TDM. The first one, in Article 3, addresses TDM for research purposes (for research organisations and cultural heritage institutions only); the second one, in Article 4, is of general nature (covering also, e.g., TDM for commercial purposes). In both cases, TDM is defined broadly enough to cover most - if not all - applications.
In the first presentation, Paweł Kamocki described the events that led to the adoption of the TDM exceptions, and provided a detailed analysis of their content. The DSM Directive was presented as part of EU Acquis, which is not meant to replace the 2001 InfoSoc Directive (and the general research exception it contains), but complement it.
The second presentation (originally to be given by Aleksei Kelli, who was unfortunately unable to participate in the Café and was replaced by Paweł Kamocki) discussed the transposition of the TDM exceptions, and the specific challenges that national legislators have to face, mostly related to the conditions in which copies made in the process of TDM (so, for example, language corpora) can be stored and shared. Three members of CLIC (Walter Scholger, Giuseppe Versaci and Mateja Jemec Tomazin) intervened to present the ongoing debates in their countries; then, Paweł Kamocki took the floor again to present the German transposition.
The final presentation by Jan Hajič discussed the ongoing transposition of the TDM exceptions in Czechia, as well as its potential practical impact on language data repositories. A special focus was on public-private partnerships and sharing results (e.g. language models) with the private sector, which still remains largely a grey area.
Questions and Follow-Up
The questions started popping up in the chat already during the first presentation, and it quickly became evident that the two hours available for the Café were too short to fully address all of them. This is why, as suggested by a participant, the discussion will continue in a focus group via email, and a follow-up event has already been scheduled for next year. As more member states will by then have finalized the implementation of the DSM Directive, it will be interesting to see the developments between now and then. Undoubtedly, the implementation and practical implications of the TDM exception will structure the debate on legal issues in language research and technology in the years to come. CLIC warmly invites all researchers and legal experts interested in the topic to join the discussion by expressing their interest to the CLARIN legal list: firstname.lastname@example.org