CLARIN-PLUS workshop: "Working with Digital Collections of Newspapers"

, - ,

Goals | Programme | Practical details | Abstracts

Goals of the workshop

Collections of newspapers in digital form are a rich source of information for researchers in a number of disciplines in the Humanities and Social Sciences. Numerous archives, datasets and corpora are available, under a variety of access conditions, and via a number of different interfaces. This workshop will aim to examine ways in which online language technology services can help to search, connect, analyse and visualize the language data in newspaper collections.

One of the objectives is to gain a better understanding of the scenarios in which scholarly communities use newspaper data, and to identify opportunities to optimize the way in which the CLARIN infrastructure supports researchers in using newspapers collections as cultural and social data. The envisaged outcome includes an action plan geared towards enhancing the support of research agendas for newspaper data with a typical CLARIN ‘touch’, such as attention for multilingual issues, the perspective of linking text to data types in other modalities,  a research design involving comparison across Europe.

This workshop is the second in a series of four, organised as part of the CLARIN-PLUS project in order to demonstrate how the application of language and speech technology tools and services on digital language material can advance humanities and social sciences research in fields other than linguistics. The next editions will focus on the added value of language technologies and the CLARIN infrastructure for (i) the exploration of parliamentary records and (ii) social media data.

Researchers who are currently working outside of CLARIN projects and networks, but with a background relevant for the topic of the workshop are also welcome, since one of the main aims of these workshops is to reach new users and start new collaborations. However, places are limited, and mostly allocated via the national CLARIN consortiums, but please get in touch with if you are interested in participating.


Videos of the invited talk, presentations and interviews are published on the CLARIN Videolectures channel.



Monday 19 September

12:30 Lunch
14:00 Welcome and introductions Ineke Schuurman (KU Leuven), Franciska de Jong (CLARIN ) slides
14:30 Keynote presentation

Joris van Eijnatten (Utrecht University), Tracing conceptual change in messy data (2): Self-reliance as boon and bane abstract slides

15:15 Presentation

​Menzo Windhouwer (CLARIN ERIC), CLARIN data, services and tools slides

15:45 Coffee break
16:15 Discussion Introduction of participants; summarizing interests and expertise, and identifying gaps; discussion
16:45 Discussion Overview of existing online resources available for research
17:30 End

Tuesday 20 September

09:30 Presentations

Historical newspaper collections

Nuno Freire (Europeana), Research use of the Europeana newspapers corpus: Past, present and future slides

Susanne Haaf (BBAW), Historical newspaper corpora for the Deutsches Textarchiv. Ways of their curation, harmonization, and provision to the community slides

Steven Claeyssens (National Library of the Netherlands), Towards a Dutch historical newspaper corpus​ - cancelled

Leon Wessels (CLARIN ERIC) and Betto van Waarden (KU Leuven), Comparison and demonstration of Delpher and BelgicaPress

11:00 break
11:30 Demonstrations

Using linguistic methods and tools with newspaper collections

Martin Wynne (CLARIN ERIC), Introduction and overview

Andrew Hardie (Lancaster University), Nineteenth century newspapers in CQPWeb

12:30 Lunch
13:30 Demonstrations

Using linguistic methods and tools with newspaper collections (continued)

Menzo Windhouwer (CLARIN ERIC), Nederlab slides

Gregor Leban (Jozef Stefan Institute), Event registry slides

15:00 Coffee break
15:30 Presentations

Short pitches from participants

  • Paul Buitelaar (The Insight Centre for Data Analytics), Irish Times slides

  • Jani Marjanen (University of Helsinki), Computational history and the transformation of public discourse in Finland, 1640-1910

  • Risto Turunen (University of Tampere), The language of socialism in Finland, 1895-1910

  • Betto van Waarden (KU Leuven), Masters of the mass press: The rise of the mediatised and media-savvy political leader in the age of new imperialism slides

  • Sinai Rusinek (JPRESS), Vision for JPRESS

  • Joris van Eijnatten (Utrecht University), Texcavator

16:30 Presentations

Research questions, approaches and methods

Sidsel Eriksen (University of Copenhagen), Danish newspapers as a main source for studies in Danish community and social politics

Theonie Stathopolou (Greek National Centre for Social Research), PROMAP: Developing a research tool for protest mapping slides

17:30 End

Wednesday 21 September

09:30 Presentations

Analysis tools for newspaper data

Lene Offersgard (University of Copenhagen), Analysis tools for Danish newspapers slides

Rachele Sprugnoli (Fondazione Bruno Kessler), Computational linguistics + data
visualization: towards the interactive exploration of newspaper data

Marieke van Erp (Vrije Universiteit Amsterdam), NewsReader: Automatically extracting events, entities and perspectives from newspapers abstract slides

11:00 Break
11:30 Discussion Actions, outcomes, next steps
13:00 Lunch

The workshop will take place in room PI 00.38. Lunches and coffee will be served in room PI 00.10.

Practical details

Arrival at the workshop is possible from 12:00 on Monday 19th September, with the sessions starting at 14:00. Lunch will be served on arrival at the venue. There will be time to check in at hotels at the end of Monday's sessions.

The workshop will end at lunchtime on Wednesday 21st September, by 13:00 at the latest.

Invited participants will be funded to attend the workshop. Participants should pay for their own travel and accommodation, and complete an expenses form to claim back the money from CLARIN ERIC. Support is offered for up to €300 for travel, and €140 per night for accommodation for two nights. Please note that these are maximum limits, and we would be grateful if participants would find the cheapest solution possible. If your travel plans make it difficult for you to stay only two nights, please get in touch.

There will be joint dinners in a local restaurant on Monday and Tuesday evenings, and lunch will be provided on each day.

There are plenty of hotels in walking distance of the workshop venue, and participants should make their own bookings.

The local organizers recommend that you avoid hotels directly on the Martelarenplein (just in front of the railway station) on this occasion, as there will be a fair taking place there during the week of the workshop, and it is likely to be crowded and noisy. This is, however, not to be confused with the Martelarenlaan, which would be fine, and is home to several hotels (e.g. Park Inn and Ibis Budget).

If you are having difficulty finding a room in central Leuven, please also consider the district of Heverlee, with busses from De Lijn ( Please note that Oud-Heverlee is a different district, and not so convenient!

Travel Advice

By air

There are direct trains from Brussel Airport-Zaventem to Leuven at 21 and 38 minutes past the hour, and travel time is 13 minutes. For the return journey, there are direct trains from Leuven to Brussels Airport-Zaventem: at 09 and 25 minutes past the hour, and travel time is 14 minutes.

Participants travelling by air should be warned to travel to Brussels-Airport Zaventem (also called Brussels National Airport or Brussels Airport) and **NOT** to Brussels South Airport or Brussels South Charleroi Airport (50 km south of Brussels, with no good connection by train to Leuven, meaning more than 2 hours travel time!).

By train

People coming by train via Brussels should take care to take the train to Leuven (not Louvain / Louvain-la-Neuve).

It is recommended to buy train tickets from Brussels or the airport in advance (, because often people are queing for both the ticket office and ticket machines!


Joris van Eijnatten (Utrecht University), Tracing conceptual change in messy data (2): self-reliance as boon and bane

As a cultural historian with an interest in demonstrating the usefulness of an assortment of digital humanities tools and techniques to researchers, both students and colleagues, I have previously discussed a number of robust, readily available and easily accessible text-mining applications. Although excellent use can be made of these tools, they have three evident shortcomings: most lack a historical dimension, most cannot cope with multilinguality, and their 'black-box' effect is in some cases large, perhaps too much so. In this lecture I will embroider on this theme by exploring the pros and cons of becoming a little less reliant on currently available, ready-made tools.

Marieke van Erp (Vrije Universiteit Amsterdam), NewsReader: Automatically extracting Events, Entities and Perspectives from Newspapers

The NewsReader toolsuite is a state-of-the-art natural language processing pipeline for four languages developed in the NewsReader project [1]. It was developed to extract information on events, entities and perspectives in current newswire from the financial economic domain. Vrije Universiteit Amsterdam is currently adapting the toolsuite to historical newspapers in the context of CLARIAH. In this talk, I will detail the different types of analyses the tools are capable of, show how they are already applied to different domains and hope to discuss with the CLARIN community how we can optimise these tools for the humanities domain.



KU Leuven
PI 00.38, Pedagogisch Instituut
Andreas Vesaliusstraat 2