Full-Text Resource Processing Training Workshop


General Information

Date: 15 June 2022
Time: 14:00 - 16:00 CEST
Location: Online via Zoom

This training workshop is organised by Twan Goosen (CLARIN ), Michał Gawor (CLARIN ERIC), Iulianna van der Lek-Ciudin (CLARIN ERIC ), Alba Irollo (Europeana) in the context of the Europeana DSI-4 project.


Jupyter notebooks are an excellent introduction to the essential principles of large-scale data processing, whether in a local environment or in computational environments available in today's cloud-based ecosystems for open science. In this workshop organised by CLARIN ERIC and Europeana Research, participants will discover how Jupyter notebooks can be used to explore and process textual resources with publicly available natural language processing ( ) tools. We will use resources from the Europeana Newspapers Collection, comprising full-text content from more than 60,000 historical newspaper issues from eight countries covering 19 different languages. CLARIN centres offer a variety of NLP tasks as a service that can be applied to text resources, such as named entity recognition, topic modelling and part-of-speech tagging.

In this training workshop, we demonstrate the use of Jupyter notebooks to education professionals working in an academic context and provide them with initial hands-on experience adapting and extending pipelines for NLP processing of text resources. Participants are guided through Jupyter notebooks that select and pre-process resources making use of metadata, run an NLP task on the selected resources, and further process and present the results. In the interactive part of the tutorial, participants learn how to make adaptations to existing notebooks and discover how to tweak and extend a notebook to any specific study and research question. There is no need to install anything on your own computer, and there are no special requirements to the available hardware or software in order to take part.

Europeana and CLARIN benefit from a long-standing partnership that, over the years, has led to the harvesting of over 200,000 Europeana items into the CLARIN Virtual Language Observatory. The next goal is to make text resources readily available for linguistic analysis and processing in research and higher education contexts. To this end, resources from the Europeana Newspapers Collection will be catalogued in the SSHOC Open Marketplace along with processing examples and other training material.


The registration to the event is limited to 16 participants, a waiting list will be available for those who register after all the places have been booked. Registered participants will be asked to confirm their attendance before the event. You can register at this link.