Skip to main content

ParlaMint I and ParlaMint II: Project Information

The ParlaMint project focused on the creation of comparable and uniformly annotated corpora of parliamentary debates in Europe. The first stage of the project resulted in the compilation of 17 corpora, while the second stage increased the time-span of the corpora, adding corpora for new countries and autonomous regions, providing a machine translated version of the corpora into English, further enhancing the corpora with additional metadata and improving the usability of the corpora.


ParlaMint I Work Plan

WP 1: Testing the approach for four languages (Lead: Maciej Ogrodniczuk (IPI-PAN), Petya Osenova (IICT-BAS))

  • T1.1: Preparation of the reference parliamentary corpora
  • T1.2: Creation of COVID-19 parliamentary corpora
  • T1.3: Mounting of the corpora on the NoSketch Engine and KonText concordancers
  • T1.4: Preparation of guidelines and mini-grant procedure

WP 2: Extending the corpora and showcasing (Lead: Tomaž Erjavec (IJS))

  • T2.1: Adding additional corpora to the infrastructure
  • T2.2: Preparation of showcases
  • T2.3: Preparation of the documentation for usage by interested parties
More detailed information can be found below in section ParlaMint I (July 2020 - May 2021)
 

ParlaMint II Work Plan

The ParlaMint II project consists of 5 work packages:
 
WP1: Documentation, interoperability, metadata  (Lead: Tomaž Erjavec (IJS), Matyáš Kopp (UFAL))
  • T1.1: Harmonization of encoding
  • T1.2: Git management
  • T1.3 Adding metadata to existing corpora
 
WP2: Corpus expansion (Lead: Tomaž Erjavec (IJS))
  • T2.1: Adding new corpora
  • T2.2: Extending existing corpora
  • T2.3: Data distribution
 
WP3: Corpus enrichment (Lead: Nikola Ljubešić (IJS))
  • T3.1: Machine translation and semantic tagging
  • T.3.2: Multimodality
 
WP4: Engagement activities (Lead: Darja Fišer (INZ), Cagri Coltekin (TUB))
  • T4.1: Tutorial
  • T4.2: Hackathon
  • T4.3: Shared task
  • T4.4: Showcases
 
WP5: Coordination (Lead: Maciej Ogrodniczuk (IPI-PAN), Petya Osenova (IICT-BAS))
  • T5.1: Management
  • T5.2: Dissemination
  • T5.3: External monitoring

 

 


Project Partners

In-Kind Contributing Partners
Belgium Jesse de Does
Bulgaria Petya Osenova, Kiril Simov
Croatia Nikola Ljubešić, Michal Mochtak
Czechia Matyáš Kopp
Denmark Costanza Navarretta, Dorte Haltrup Hansen, Bart Jongejan
France Giancarlo Luxardo, Sascha Diwersy
Hungary Miklós Sebők - ParlaMint I, Noémi Ligeti-Nagy - ParlaMint II
Iceland Starkaður Barkarson
Italy Tommaso Agnoloni, Giulia Venturi
Latvia Roberts Darģis
Lithuania Tomas Krilavičius, Andrius Utka, Vaidas Morkevičius, Petkevičius Mindaugas, Monika Briedienė
Netherlands Maarten Marx, Ruben van Heusden
Poland Maciej Ogrodniczuk, Michał Rudolf, Danijel Korzinek
Slovenia Tomaž Erjavec, Andrej Pančur, Darja Fišer
Spain María Calzada Pérez, Ruben de Libano, Monica Albini
Turkey Çağrı Çöltekin
UK Paul Rayson, Matt Coole

New Partners

Austria Hannes Pirker, Tanja Wissik
Basque Country Mikel Iruskieta
Bosnia and Herzegovina Michal Mochtak, Nikola Ljubešić
Catalonia Nuria Bel
Estonia Kadri Vider, Neeme Kahusk, Martin Mölder
Finland Eero Hyvönen, Jouni Tuominen
Galicia Adina Ioana Vladu, Carmen Magariños, Daniel Bardanca, Mario Barcala, Marcos Garcia, María Pérez Lago, Pedro García Louzao, Ainhoa Vivel Couso, Marta Vázquez Abuín, Noelia García Díaz, Adrián Vidal Miguéns, Elisa Fernández Rei
Greece Maria Gavriilidou
Norway Magnus Breder Birkenes, Jon Arild Olsen, Koenraad De Smedt
Portugal Amália Mendes
Romania Petru Rebeja, Madalina Chitez, Cornelia Ilie
Serbia Michal Mochtak, Nikola Ljubešić
Sweden Fredrik Norén
Ukraine Anna Kryvenko, Matyáš Kopp

Financial Support

ParlaMint 1 Financial Support
  • CLARIN ERIC: Budget: 135,000 EUR
  • ARRS (Slovenian Research Agency) P2-103 "Knowledge Technologies"
  • ARRS (Slovenian Research Agency) P6-0411 "Language Resources and Technologies for Slovene"
  • CLARIN-LV, European Regional Development Fund project 1.1.1.5/18/I/016 "University of Latvia and institutes in the European Research Area - Excellency, activity, mobility, capacity"
  • LINDAT/CLARIAH-CZ LM2018101 "Digital Research Infrastructure for Language Technologies, Arts and Humanities"
  • Ministry of Education and Science Republic of Bulgaria DO01-272/16.12.2019 "Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies CLaDA-BG"
  • Spanish Ministry of Science and Innovation PID2019-108866RB-I0 / AEI / 10.13039/501100011033 "Original, translated and interpreted representations of the refugee cris(e)s: methodological triangulation within corpus-based discourse studies"
  • The Research Council of Lithuania P-MIP-20-373 "Policy Agenda of the Lithuanian Seimas and its Framing: The Analysis of the Seimas Debates in 1990 2020".


ParlaMint 2 Financial Support
  • CLARIN ERIC: Budget: 163,000 EUR
  • In-kind contributing partners: please see in section Project Partners
  • ARRS (Slovenian Research Agency) P6-0411 "Language Resources and Technologies for Slovene"
    Nederlandse Organisatie voor Wetenschappelijk Onderwijs CISC.CC.016 "Access to City Councils using Exploratory Search Systems"
  • ARRS (Slovenian Research Agency) J7-4642 "MEZZANINE"
  • ARRS (Slovenian Research Agency) N6-0099 "Flemish-Slovenian bilateral basic research project ‘Linguistic landscape of hate speech online’ (2019-2023)"
  • ARRS (Slovenian Research Agency) N6-0288 "the MSCA Seal of Excellence postdoctoral project 'The Changing Discursive Semantics of EU Representations' (2022-2024)"
  • Austrian Academy of Sciences - "ÖAW"
  • Bulgarian Ministry of Education and Science DO1-301/17.12.21 "Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH"
  • Department of Nordic Studies and Linguistics (NorS), University of Copenhagen CLARIN-DK "CLARIN-DK"
    Dutch Language Institute
  • European Commission POIR.04.02.00-00C002/19 "European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure"
  • Fundação para a Ciência e a Tecnologia UIDP/00214/2020
  • Galician Language Institute, University of Santiago de Compostela
  • Hungarian Research Centre for Linguistics
  • Institute for Language and Speech Processing / ATHENA RC
  • Institute of Computer Science, Polish Academy of Sciences - "statutory research"
  • Jožef Stefan Institute CLARIN "CLARIN.SI"
  • Ministry of Education, Youth and Sports of the Czech Republic LM2023062 "LINDAT/CLARIAH-CZ: Digital Research Infrastructure for Language Technologies, Arts and Humanities"
  • National Library of Norway
  • ParlaMint-ES: Spanish Ministry of Science and Innovation PID2019-108866RB-10 / AEI / 10.13039/501100011033 "Original, translated and interpreted representations of the refugee cris(e)s: methodological triangulation within corpus-based discourse studies"
  • Polish Ministry of Education and Science 2022/WK/09 "National contribution to CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure 2022–2023 (CLARIN Q)"
  • Slovenian Research Agency (ARRS) P6-0436 "Basic national research program 'Digital Humanities' (2022-2027)"
  • The Árni Magnsússon Institute for Icelandic Studies
  • Xunta de Galicia - University of Santiago de Compostela 2021-CP080 "Nós: Galician in the society and economy of artificial intelligence (2021-CP080), agreement between Xunta de Galicia and the University of Santiago de Compostela"

Contact Persons

Maciej Ogrodniczuk: maciej.ogrodniczuk [at] gmail.com (maciej[dot]ogrodniczuk[at]gmail[dot]com)

Petya Osenova: petya [at] bultreebank.org (petya[at]bultreebank[dot]org)
 

ParlaMint I (July 2020 - May 2021)

  1. Creating a multilingual set of uniformly annotated corpora of parliamentary proceedings dating from November 2019 to July 2020 (thus covering current COVID-19 pandemic situation).
  2. Creating a set of comparable multilingual reference corpora of parliamentary data from 2015 to October 2019.
  3. Processing the corpora linguistically to add syntactic structures of Universal Dependencies as well as Named Entities annotation.
  4. Making the corpora available through concordancers and Parlameter.
  5. Building use cases in Political Sciences and Digital Humanities based on the corpus data.

Phase 1 (July 1, 2020 - September 30, 2020)

In Phase 1 the approach, described in tasks 1, 2, 3 and 4 in the Tasks section will be tested for 4 pilot languages – Bulgarian, Croatian, Slovene and Polish. 

  1. Creation of the COVID-19 parliamentary corpora (Nov. 2019 - July 2020) as well as the reference parliamentary corpora (2015 - Oct. 2019).
    1. The data will be processed to adhere to the Parla-CLARIN TEI annotation scheme. [Responsible persons for gathering and conversion of data: Bulgarian (Petya Osenova and Kiril Simov); Croatian (Nikola Ljubešić); Slovene (Andrej Pančur); Polish (Maciej Ogrodniczuk and Michał Rudolf)]. 
    2. The data will be be processed linguistically with Universal Dependencies and Named Entities (Responsible person: Nikola Ljubešić)
  2. Mounting of the corpora on the NoSketch Engine and KonText concordancers. (Responsible person: Tomaž Erjavec)
  3. Mounting of the corpora to Parlameter. (Responsible person: Filip Muki Dobranić and Tomaž Kunst)
  4. Preparation of guidelines. (Responsible persons: All) 
 
Phase 2 (October 1, 2020 - May 30, 2021)
In Phase 2 the parliamentary corpora were extended to more languages and parliaments. For this phase a special Call for interest in participation was published in October 2020. 

Also, three showcases are envisaged:

  • Availability in Parlameter: facilitating the corpus graphical exploration suited for politological investigations and investigations by journalists and active citizens. (Responsible person: Filip Muki Dobranić)
  • The linguistic showcase will extend the CLARIN tutorial "Voices of the Parliament" showing how corpora can be used to investigate language use and communication practices in a specialised socio-cultural context of political discourse. (Responsible person: Darja Fišer and Kristina Pahor de Maiti). http://doi.org/10.3828/mlo.v0i0.295 
  • The DH-related showcase will be prepared by a digital historian. (Responsible person: Ruben Ros)

The ParlaMint project started with creating recent corpora of parliamentary sessions for 4 parliaments: Bulgarian, Croatian, Polish and Slovene. In 2021 the project was being extended with data for 13 additional parliaments of the following countries: Belgium, Czech Republic, Denmark, France, Hungary, Iceland, Italy, Latvia, Lithuania, Romania, the Netherlands, Turkey, UK.  Release 2.1 of the dataset is now available. 

 
Parliamentary Corpora for 17 languages (release: Q2 2021)
Corpora as Data 
Corpora in Concordancers 
 
Parliamentary Corpora for 4 pilot languages (release: Q4 2020)
Corpora as Data 


The proposals of the following applicants were assessed and approved by the ParlaMint team in consultation with representatives of CLARIN Board of Directors:

Name Affiliation Language
Paul Rayson Lancaster University English
Ruben van Heusden University of Amsterdam – ILPS research group Dutch
Steinþór Steingrímsson The Árni Magnússon Institute for Icelandic Studies Icelandic
Tomas Krilavičius Applied Informatics dept., Vytautas Magnus University (Vytauto Didžiojo University) Lithuanian
Barbora Hladká Charles University Czech
Giulia Venturi Institute for Computational Linguistics "A. Zampolli" (ILC-CNR) Italian
Çağrı Çöltekin University of Tübingen Turkish
Costanza Navarretta University of Copenhagen Danish
Miklós Sebők Centre for Social Sciences, Budapest, Hungary Hungarian
Giancarlo Luxardo Praxiling UMR 5267 French
Roberts Darģis Institute of Mathematics and Computer Science, University of Latvia Latvian
Petru Rebeja Alexandru Ioan Cuza University of Iași Romanian
Jesse de Does Instituut voor de Nederlandse Taal Belgian Dutch/French
María Calzada Pérez - from the Translation Studies Unit, Universitat Jaume I, Castellón de la Plana


  • CLARIN Café: ParlaMint Unleashed on 28 June 2021 from 14:00 to 16:00, organised by Tomaž Erjavec, Darja Fišer, Maciej Ogrodniczuk, Petya Osenova. Eight months after the introductory CLARIN Café on ParlaMint we present the results, lessons learnt and showcases of the project. 
  • CLARIN Café - Join Our Parliamentary-flavoured Coffee: ParlaMINT. The ParlaMint project team presented current results and provided information about the opportunities to join either as contributor,  as a user, or both. Organized by Petya Osenova (Sofia University and IICT-BAS) and Maciej Ogrodniczuk (Institute of Computer Science, Polish Academy of Sciences). 
    • The dedicated call can be found here.
    • Watch the video recordings of the Café here.