A Recap on CLARIN Café: ParlaMint Unleashed

Submitted by e.gorgaini@uu.nl on 12 July 2021



On the 28th of June the last CLARIN Café before the summer break was held. It was organised by Tomaž Erjavec (Jožef Stefan Institute), Darja Fišer (University of Ljubljana, Institute of Contemporary History and Jožef Stefan Institute), Maciej Ogrodniczuk (Institute of Computer Science, Polish Academy of Sciences) and Petya Osenova (Institute of Information and Communication Technologies, Bulgaria). This Café presented the results of the CLARIN-funded ParlaMint project, which harmonised parliamentary corpora for 17 countries and has provided data and tools for focused observations on trends, opinions, decisions on lockdowns and restrictive measures related to the COVID-19 pandemic. 

The Café was moderated by Darja Fišer (Chair of the CLARIN UIC), who introduced CLARIN and the CLARIN Resource Families, which is the umbrella activity for the ParlaMint project and strives to improve findability and interoperability of resources of the same type. 

Watch the recording of the CLARIN 101, CLARIN Resource Families on the CLARIN YouTube channel.


ParlaMint: What is it all about?

The Café opened with a presentation of the ParlaMint project by its coordinators, Maciej Ogrodniczuk and Petya Osenova. They focused on Phase 2 of the project, which enabled the addition of 13 corpora thanks to the mini grants offered by CLARIN. 

In particular, Maciej and Petja underlined the challenges encountered in terms of conceptual differences among the parliamentary systems in Europe (e.g. unicameral and bicameral parliaments), but also in terms of the differences of formats in which such data is published. The various approaches in terms of data curation and processing required by the individual national teams were also presented. The novel approaches of the ParlaMint project for the production of comparable parliamentary corpora were underlined, and new avenues for future extensions of the ParlaMint corpora were proposed.

Watch the recording of ParlaMint: what was it all about? on the CLARIN YouTube channel.


Creating comparable multilingual corpora of parliamentary debates

After this first presentation of the project Tomaž Erjavec went into the details of the core of the ParlaMint initiative, namely the common formatting and encoding guidelines. Based on , the ParlaMint format can be validated against an XSLT schema. Automatic morpho-syntactic annotation with UD was also required. 

Despite the available documentation and schema, the validation process was still a substantial effort. Over 70 GitHub issues and 500 commits and around 200 email exchanges were needed for the ParlaMint 2.1 to be finally published. The team effort is also reflected through the number of authors of the resource published on the CLARIN.SI repository (http://hdl.handle.net/11356/1432 and http://hdl.handle.net/11356/1431).

Watch the recording of Creating comparable multilingual corpora of parliamentary debates on the CLARIN YouTube channel.


Use cases

The remarkable amount of work by the ParlaMint team is rewarded by the availability of the comparable resources. The Café presented two of the first comparative explorations of the corpora.

The first one was "Parliamentary debates in COVID times" by Marta Kołczyńska who reported on the team work carried out at the 2021 DHH Hackathon, the detailed results of which are presented in a dedicated blog post. It concentrated on the comparison between the COVID period debates and those from the reference period before the pandemic. Using data from Italy, Slovenia, UK and Poland, their exploration utilised corpus linguistics techniques, such as keyword extraction and collocation analysis. Thanks to the rich set of metadata, they also compared the evolution of debates on COVID in the various countries to the epidemics data with the trends on infections in the respective countries.


Watch the recording of Parliamentary debates in COVID times on the CLARIN YouTube channel.


The second use case was presented by Miguel Pieters from the University of Amsterdam and concentrated on the role of women in politics and society. He compared the number of words uttered by male and female MPs. For most of the observed countries the number of words uttered by women is still significantly lower than for their male counterparts. 


In the second part of his analysis, Miguel analyzed the dynamics of some of the prominently debated issues in contemporary society per country such as climate change. 


Watch the recording of A comparative analysis on the ParlaMint project on the CLARIN YouTube channel.


A third showcase was carried out by Ruben Ros within the ParlaMint project, which explores attitudes towards science and expertise in COVID-19 parliamentary debates.

Watch the recording of the showcase on the CLARIN YouTube channel or read the full report.


Finally, an important aspect of parliamentary data is their relevance and impact  for citizens. In this sense, an important use case was the Parlameter platform, presented by Filip Dobranić. The platform is dedicated to the exploration of parliamentary data for the general public, e.g. https://parlameter.org/sl/.

  Watch the recording of ParlaMint and Parlameter: How standardized data formats empower end users on the CLARIN YouTube channel.


Panel and discussion

The Café was concluded by a discussion panel with lessons learnt from the Czech, Icelandic, Italian and UK project partners. They touched upon the issues they have experienced with data collection, cleaning and structuring as well as their annotation and encoding. They highlighted the invaluable effort carried out by the mini grant recipients in terms of metadata collection and encoding, which they saw as time consuming but also as a great learning opportunity.

During the discussion, several questions were posed. One addressed whether there was any plan to engage with researchers from different disciplines, such as historians and political scientists who might be interested in studying this type of data. The conversation also touched upon some considerations about the need of updating the dataset by extending the number of languages covered the time period considered to also include older data. Further evolutions of the annotation format were also discussed. 

Watch the recordings of the panel discussion the CLARIN YouTube channel.

The presentation slides of this CLARIN Café can be found on the event page

Next CLARIN Cafés

To stay update on newly scheduled Cafés you can consult the CLARIN news section, subscribe to the CLARIN Newsflash and follow CLARIN on Twitter (#CLARINCafé). The CLARIN Café page will also always provide the latest details.

If you want to receive an individual email for each virtual event organised by CLARIN you can subscribe to the Virtual Events Announcements mailing list.