CLARIN and Libraries 2023: Large Language Models and Libraries

, - ,


CLARIN and Libraries
The workshop builds upon the first CLARIN and Libraries workshop held in the Hague in May 2022 (see here).
This year's workshop will investigate further areas of collaboration between CLARIN-related initiatives and libraries with a special emphasis on building (large) language models in and in cooperation with libraries. The workshop will bring together for the second time a group of people associated with both CLARIN (or other research infrastructures) and libraries. Whereas the first CLARIN and Libraries workshop was particularly concerned with digital content delivery for researchers, the main theme of the second workshop will be large language models and library collections, e.g. technical challenges in building such models and legal implications of model training and use. 
Participation in the workshop is by invitation. If you are interested in attending, please contact your national coordinator or
The host, the National Library of Norway (NLN), has since 2005 digitised its entire text collections, amounting at present to a large corpus of 160 billion words for Norwegian and has built large language models for text (BERT, GPT-2, T5) and speech (wav2vec, Whisper) on these collections. There will be keynotes from the National Libraries of Norway and Germany on the technical aspects of building such models in a library setting, as well as a keynote on the legal aspects of building large language models from the Swedish National Library.
A group of existing participants in CLARIN national activities who are associated with libraries are invited to the workshop, and national coordinators will have the opportunity to nominate further participants, in order to obtain maximum Europe-wide coverage. 

Draft Programme

Tuesday 5 December 2023

12:00 - 13:00 Lunch
13:30 - 15:00
  • Introduction to CLARIN and Libraries, wrap-up from last year’s workshop (15 mins)
  • Tour de table: introduction and points for discussion (45 mins)
  • Library collections as data (Sally Chambers)
15:00 - 15:30  Break
15:30 - 17:00
  • Large language models at the National Library of Norway (Svein Arne Brygfjeld)
  • Large language models at the German National Library (Peter Leinen)
  • Discussion: technical aspects
19:00  Evening social dinner

Wednesday 6 December 2023

9:30 - 10:30
  • Legal aspects of large language models in libraries (Jerker Rydén)
  • Discussion: legal aspects
10:30 - 11:00  Break
11:00 - 12:00 Lightning Talks: Participants will have the opportunity to introduce their own projects and resources
12:00 - 13:00  Lunch

National Library of Norway
Henrik Ibsens gate 110
0255 Oslo