Blog post written by Griet Depoorter and Katrien Depuydt
In 2016, the Institute for Dutch Language (INT) sent out an information sheet and a questionnaire to Flemish research groups in the field of humanities. The goal was to promote CLARIN and the INT (as a CLARIN Centre), to get an overview of the usage of the CLARIN infrastructure and services, and to receive input concerning the expectations of the user community about CLARIN and the INT. After receiving the feedback, the director of the INT, Frieda Steurs, conducted talks with a number of those groups (for instance, the Language Group Flemish Sign Language at the University of Leuven, the Ghent Centre for Digital Humanities at Ghent University and the Department of Linguistics at the University of Antwerp) to receive further in-depth information. What the centre learnt this way is that the linguistic community in Flanders would like the CLARIN consortium to expand existing datasets (e.g. the Dutch Parallel Corpus) or create new ones, like a corpus of spoken (or written) Dutch and videos of the corresponding Flemish sign language. In addition, the INT CLARIN Centre should serve as a knowledge center for standards and annotation protocols as well as offer expertise and support for researchers who have little or no experience with digital research.
As a result of this knowledge sharing and gathering initiative, the INT organized the first training workshop in October 2017 in Antwerp in cooperation with Digital Humanities Flanders (DHuF) and the Faculty of Arts of the University of Antwerp. An invitation was sent out to all DHuF members, and the event announcement was also published on the websites of DHuF and the INT. Eventually 32 researchers, primarily historians, from Antwerp, Leuven, Ghent and the Netherlands attended the workshop.
The workshop showcased what the INT, which had then been just established, can offer digital humanities researchers, especially those researchers who use historical language material. The event started with a presentation of CLARIN in general and the INT as a CLARIN Centre and continued with an introduction into corpus building, focusing on historical corpus building in particular. The best practices with regards to building historical resources were discussed and exemplified with a concrete usecase; namely, with the Nederlab text collection. Nederlab is a web environment for researchers and students who study the evolution of the Dutch Language, literature and culture. The website offers millions of pages of (historical) Dutch texts that can be researched and analyzed with user-friendly text analysis software.
Another topic of the workshop was the enrichment of corpus material. The specific challenges of annotating historical texts were discussed; for instance, the fact that standard (modern) tag sets cannot be applied to historical texts, and that tokenizers cannot handle clitics well. Furthermore, the webservice INL Labs was demonstrated, which linguistically annotates (historical) texts (in various input formats). INL Labs uses two annotation tools: the Stanford NE tagger and the INT-developed tagger-lemmatizer for historical Dutch.
The workshop also demonstrated how historical data can be searched with Blacklab, a corpus retrieval engine which is available as a webservice and as a Java library. The tool allows fast, complex searches with accurate hit highlighting on large, tagged and annotated, bodies of text and it can be used to search historical corpora like the Corpus Gysseling and Letters as loot. Users were also shown how to search their own data by means of Autosearch, which is powered by the Blacklab Engine.
Finally there was a presentation on the work on the diachronic computational lexica of the INT, GiGaNT and DiaMaNT, and what the benefits would be of having the data available as Linked Open Data.
This was the first of a number of workshops that the INT CLARIN Centre plans to give in Flanders. On March 20, the consortium is going to organize an information session that will involve humanities researchers from the University of Leuven. During the session, the consortium will gather information on what the specific needs of the attending researchers are in relation to the services provided by the INT and CLARIN in general. On the basis of their feedback, the DLU then plans to organize follow-up workshops that will be specifically tailored to the needs of focused research groups with shared interests.
Click here to read more about Tour de CLARIN