linguistic annotation

Helsinki Digital Humanities Hackathon 2021: ‘Parliamentary Debates in COVID Times’

7 December 2021
The Project, Organised by the University of Helsinki, the online hackathon ‘Parliamentary Debates in COVID Times’ was a short, intense project that took place from
‘Compiling a corpus is already a big project, so being able to skip this step was a huge privilege. Also, knowing that the corpus was granted permissi, btn-arrow-circle, Figure--t-SNE-plot-with-perplexity-20-and-exaggeratio-highres.png, image-left
Methodology, As their main data source, the team used the ParlaMint 2.1 dataset, a multilingual set of uniformly annotated corpora of parliamentary proceedings.
In order to identify which keywords occurred across all four countries, and which were country-specific, the hackathon team then manually selected the
‘It was really nice to have such a well-structured dataset of this size. It’s great that the dataset spans several years and that it’s well-annotated,, btn-arrow-circle, image-right
Outcome, The results showed that the majority of the top fifty keywords for all countries were related to the pandemic. In addition, there was a strong overlap
The collocation networks offered useful insight into the relationship between key terms in the parliamentary discussions, especially when viewed again
CLARIN Tools and Resources, The project was based on the recently published ParlaMint 2.1 dataset. The sessions in the corpora are marked as either belonging to the COVID-19 peri
Access ParlaMint 2.1, btn-arrow-circle, Parlamint.png, image-left
Views on CLARIN, ‘I really like the comparative perspective that the ParlaMint dataset offers, making it possible to compare different national parliaments. It would b
For a comprehensive discussion of the hackathon, see the blog post, btn-arrow-circle, Heldig.jpg, image-left
To watch a video about the hackathon, go to CLARIN Café, btn-arrow-circle, CLARINcafe.png, image-left
A linguistically marked-up version of the corpus is available here.   For a recently published, multimedia tutorial on how to conduct high-quali
Isabella Calabretta, Digital Product Manager at Cambridge University Press & Assessment Courtney Dalton, MLIS student at Simmons University, Boston

ABaC:us – Austrian Baroque Corpus

Austrian Baroque Corpus collage of page images

The Austrian Baroque Corpus is a digital collection of printed German language texts dating from the Baroque era, now freely available through the Austrian Centre for Digital Humanities:

At present, the digital collection holds several texts specific to the memento mori genre written by, or ascribed to, Abraham a Sancta Clara (1644-1710), who was a renowned Augustinian monk, and a widely read author throughout Europe at his time. All of the texts (sermons, devotional books and works related to the dance-of-death theme) have been enriched with different layers of structural information and tagged using automated tools adapted to the specific needs of the language of the period. One important achievement of the project is that each occurring historic word form has been electronically mapped to its corresponding lemma in High German and corrected or verified by domain experts. Throughout all of the phases of the workflow, the interdisciplinary team (literary, linguistic, and text technology specialists) insisted on high quality linguistic and semantic annotation, creating a sound basis that allows for sophisticated research questions. 

Austrian Centre for the Digital Humanities
Project leader
Claudia Resch
Contact email

The present corpus was compiled between 2010 and 2015 at the Institute for Corpus Linguistics and Text Technology (ICLTT) and at the Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences, alongside two associated research projects: “Text‐Technological Methods for the Analysis of Austrian Baroque Literature“ (March 2012 – September 2014, supported by funds of the Österreichische Nationalbank, Anniversary Fund) and “Mortuary Cult in 17th Century Vienna: Confraternity Studies in the Digital Age” (June 2014 – May 2015, supported by funds of the City of Vienna).