Collections of newspapers in digital form are a rich source of information for researchers in a number of disciplines in the Humanities and Social Sciences. Numerous archives, datasets and corpora are available, under a variety of access conditions, and via a number of different interfaces. This workshop will aim to examine ways in which online language technology services can help to search, connect, analyse and visualize the language data in newspaper collections.
One of the objectives is to gain a better understanding of the scenarios in which scholarly communities use newspaper data, and to identify opportunities to optimize the way in which the CLARIN infrastructure supports researchers in using newspapers collections as cultural and social data. The envisaged outcome includes an action plan geared towards enhancing the support of research agendas for newspaper data with a typical CLARIN ‘touch’, such as attention for multilingual issues, the perspective of linking text to data types in other modalities, a research design involving comparison across Europe.
This workshop is the second in a series of four, organised as part of the CLARIN-PLUS project in order to demonstrate how the application of language and speech technology tools and services on digital language material can advance humanities and social sciences research in fields other than linguistics. The next editions will focus on the added value of language technologies and the CLARIN infrastructure for (i) the exploration of parliamentary records and (ii) social media data.
Researchers who are currently working outside of CLARIN projects and networks, but with a background relevant for the topic of the workshop are also welcome, since one of the main aims of these workshops is to reach new users and start new collaborations. However, places are limited, and mostly allocated via the national CLARIN consortiums, but please get in touch with email@example.com if you are interested in participating.
Videos of the invited talk, presentations and interviews are published on the CLARIN Videolectures channel.
Monday 19 September
|14:00||Welcome and introductions||Ineke Schuurman (KU Leuven), Franciska de Jong (CLARIN ERIC) slides|
Menzo Windhouwer (CLARIN ERIC), CLARIN data, services and tools slides
|16:15||Discussion||Introduction of participants; summarizing interests and expertise, and identifying gaps; discussion|
|16:45||Discussion||Overview of existing online resources available for research|
Tuesday 20 September
Historical newspaper collections
Nuno Freire (Europeana), Research use of the Europeana newspapers corpus: Past, present and future slides
Susanne Haaf (BBAW), Historical newspaper corpora for the Deutsches Textarchiv. Ways of their curation, harmonization, and provision to the community slides
Steven Claeyssens (National Library of the Netherlands), Towards a Dutch historical newspaper corpus - cancelled
Leon Wessels (CLARIN ERIC) and Betto van Waarden (KU Leuven), Comparison and demonstration of Delpher and BelgicaPress
Using linguistic methods and tools with newspaper collections
Martin Wynne (CLARIN ERIC), Introduction and overview
Andrew Hardie (Lancaster University), Nineteenth century newspapers in CQPWeb
Using linguistic methods and tools with newspaper collections (continued)
Menzo Windhouwer (CLARIN ERIC), Nederlab slides
Gregor Leban (Jozef Stefan Institute), Event registry slides
Short pitches from participants
Research questions, approaches and methods
Sidsel Eriksen (University of Copenhagen), Danish newspapers as a main source for studies in Danish community and social politics
Theonie Stathopolou (Greek National Centre for Social Research), PROMAP: Developing a research tool for protest mapping slides
Wednesday 21 September
Analysis tools for newspaper data
Lene Offersgard (University of Copenhagen), Analysis tools for Danish newspapers slides
Rachele Sprugnoli (Fondazione Bruno Kessler), Computational linguistics + data
|11:30||Discussion||Actions, outcomes, next steps|
The workshop will take place in room PI 00.38. Lunches and coffee will be served in room PI 00.10.
Arrival at the workshop is possible from 12:00 on Monday 19th September, with the sessions starting at 14:00. Lunch will be served on arrival at the venue. There will be time to check in at hotels at the end of Monday's sessions.
The workshop will end at lunchtime on Wednesday 21st September, by 13:00 at the latest.
Invited participants will be funded to attend the workshop. Participants should pay for their own travel and accommodation, and complete an expenses form to claim back the money from CLARIN ERIC. Support is offered for up to €300 for travel, and €140 per night for accommodation for two nights. Please note that these are maximum limits, and we would be grateful if participants would find the cheapest solution possible. If your travel plans make it difficult for you to stay only two nights, please get in touch.
There will be joint dinners in a local restaurant on Monday and Tuesday evenings, and lunch will be provided on each day.
There are plenty of hotels in walking distance of the workshop venue, and participants should make their own bookings.
The local organizers recommend that you avoid hotels directly on the Martelarenplein (just in front of the railway station) on this occasion, as there will be a fair taking place there during the week of the workshop, and it is likely to be crowded and noisy. This is, however, not to be confused with the Martelarenlaan, which would be fine, and is home to several hotels (e.g. Park Inn and Ibis Budget).
If you are having difficulty finding a room in central Leuven, please also consider the district of Heverlee, with busses from De Lijn (https://www.delijn.be/nl/zoekresultaten/?searchtext=Leuven#tab=3&page=1). Please note that Oud-Heverlee is a different district, and not so convenient!
There are direct trains from Brussel Airport-Zaventem to Leuven at 21 and 38 minutes past the hour, and travel time is 13 minutes. For the return journey, there are direct trains from Leuven to Brussels Airport-Zaventem: at 09 and 25 minutes past the hour, and travel time is 14 minutes.
Participants travelling by air should be warned to travel to Brussels-Airport Zaventem (also called Brussels National Airport or Brussels Airport) and **NOT** to Brussels South Airport or Brussels South Charleroi Airport (50 km south of Brussels, with no good connection by train to Leuven, meaning more than 2 hours travel time!).
People coming by train via Brussels should take care to take the train to Leuven (not Louvain / Louvain-la-Neuve).
It is recommended to buy train tickets from Brussels or the airport in advance (http://www.belgianrail.be/en/Default.aspx), because often people are queing for both the ticket office and ticket machines!
Joris van Eijnatten (Utrecht University), Tracing conceptual change in messy data (2): self-reliance as boon and bane
As a cultural historian with an interest in demonstrating the usefulness of an assortment of digital humanities tools and techniques to researchers, both students and colleagues, I have previously discussed a number of robust, readily available and easily accessible text-mining applications. Although excellent use can be made of these tools, they have three evident shortcomings: most lack a historical dimension, most cannot cope with multilinguality, and their 'black-box' effect is in some cases large, perhaps too much so. In this lecture I will embroider on this theme by exploring the pros and cons of becoming a little less reliant on currently available, ready-made tools.
Marieke van Erp (Vrije Universiteit Amsterdam), NewsReader: Automatically extracting Events, Entities and Perspectives from Newspapers
The NewsReader toolsuite is a state-of-the-art natural language processing pipeline for four languages developed in the NewsReader project . It was developed to extract information on events, entities and perspectives in current newswire from the financial economic domain. Vrije Universiteit Amsterdam is currently adapting the toolsuite to historical newspapers in the context of CLARIAH. In this talk, I will detail the different types of analyses the tools are capable of, show how they are already applied to different domains and hope to discuss with the CLARIN community how we can optimise these tools for the humanities domain.