comparing corpora

Helsinki Digital Humanities Hackathon 2021: ‘Parliamentary Debates in COVID Times’

7 December 2021
The Project, Organised by the University of Helsinki, the online hackathon ‘Parliamentary Debates in COVID Times’ was a short, intense project that took place from
‘Compiling a corpus is already a big project, so being able to skip this step was a huge privilege. Also, knowing that the corpus was granted permissi, btn-arrow-circle, Figure--t-SNE-plot-with-perplexity-20-and-exaggeratio-highres.png, image-left
Methodology, As their main data source, the team used the ParlaMint 2.1 dataset, a multilingual set of uniformly annotated corpora of parliamentary proceedings.
In order to identify which keywords occurred across all four countries, and which were country-specific, the hackathon team then manually selected the
‘It was really nice to have such a well-structured dataset of this size. It’s great that the dataset spans several years and that it’s well-annotated,, btn-arrow-circle, image-right
Outcome, The results showed that the majority of the top fifty keywords for all countries were related to the pandemic. In addition, there was a strong overlap
The collocation networks offered useful insight into the relationship between key terms in the parliamentary discussions, especially when viewed again
CLARIN Tools and Resources, The project was based on the recently published ParlaMint 2.1 dataset. The sessions in the corpora are marked as either belonging to the COVID-19 peri
Access ParlaMint 2.1, btn-arrow-circle, Parlamint.png, image-left
Views on CLARIN, ‘I really like the comparative perspective that the ParlaMint dataset offers, making it possible to compare different national parliaments. It would b
For a comprehensive discussion of the hackathon, see the blog post, btn-arrow-circle, Heldig.jpg, image-left
To watch a video about the hackathon, go to CLARIN Café, btn-arrow-circle, CLARINcafe.png, image-left
A linguistically marked-up version of the corpus is available here.   For a recently published, multimedia tutorial on how to conduct high-quali
Isabella Calabretta, Digital Product Manager at Cambridge University Press & Assessment Courtney Dalton, MLIS student at Simmons University, Boston

Word level based comparative text analysis

Many questions of the humanities, which relate to specific text resources, can be reduced to the analysis of vocabulary. Especially the comparison of such vocabulary is of central interest. This may require comparing two own text resources or a text resource with a reference corpus. CLARIN allows to easily perform such comparative analyses using the resources and Web tools it provides. The following show case will demonstrate this on the basis of a simple example. It covers the discovery and selection of resources, their processing and finally their analysis. The aim is to demonstrate to scholars how to answer own scientific questions with the help of comparative text analysis within CLARIN.




ASV Leipzig
Project leader
Dirk Goldhahn

ASV Leipzig wants to thank Thomas Gloning for close collaboration while devoloping the web application.