Skip to main content

ParlaCLARIN II Workshop on Creating, Using and Linking Parliamentary Corpora with Other Types of Political Discourse (Virtual Event)

, -

ParlaCLARIN II goes virtual

The 2020 ParlaCLARIN workshop was supposed to be held in Marseille (France), as part of the 12th edition of the Language Resources and Evaluation Conference (LREC2020). However, due to the COVID-19 pandemic, the main conference is postponed, for this reason, the ParlaCLARIN II workshop will take place in a virtual form.


Proceedings can be found here.


Please register using this link in order to receive the meeting room details.

Call for Papers

The call for papers is closed.

Workshop Description

Parliamentary data is a major source of socially relevant content. It is available in ever larger quantities, is multilingual, accompanied by rich metadata, and has the distinguishing characteristic that it is spoken language produced in controlled circumstances which has traditionally been transcribed but is now increasingly released also in audio and video formats. All these factors require solutions related to structuring, synchronization, visualization, querying and analysis of parliamentary corpora. Furthermore, approaches to the exploitation of parliamentary corpora to their full extent also have to take into account the needs of researchers from vastly different Humanities and Social Sciences fields, such as political sciences, sociology, history, and psychology.

An inspiring and highly successful first edition of the ParlaCLARIN scientific workshop held at LREC 2018 and a follow-up developmental ParlaFormat workshop held at CLARIN in 2019 resulted in a comprehensive overview of a multitude of the existing parliamentary resources worldwide as well as tangible first steps towards better harmonization, interoperability and comparability of the resources and tools relevant for the study of parliamentary discussions and decisions.

The second ParlaCLARIN workshop therefore aims to bring together developers, curators and researchers of regional, national and international parliamentary debates that are suitable for research in disciplines in the Humanities and Social Sciences. We invite unpublished original work focusing on the compilation, annotation, visualisation and utilisation of parliamentary records as well as linking or comparing parliamentary records with other datasets of political discourse such as party manifestos, political speeches, political campaign debates, social media posts, etc. Apart from dissemination of the results, the workshop also aims to address the identified obstacles, discuss open issues and coordinate future efforts in this increasingly trans-national and cross-disciplinary community.


Due to Freedom of Information Acts that are supported by the United Nations and set in place in over 100 countries worldwide, parliamentary debates are being increasingly easy to obtain, and have always been of interest to researchers from a wide range fields in Humanities and Social Sciences both for the potential influence of their content, and the specificities of the formalized, often persuasive and emotional language use in this context. As a consequence, there are many initiatives, on the national and international levels, that aim at compiling and analysing parliamentary data. CLARIN-PLUS survey on parliament data has identified over 20 corpora of parliamentary records, with over half of them being available within the CLARIN infrastructure (

Given the maturity, variety, and potential of this type of language data as well as the rich metadata it is complemented with, it is urgent to gather researchers both from the side of those producing parliamentary corpora and making them available, those making use of them for linguistic, historical, political, sociological etc. research as well as those linking or comparing them with other datasets of political discourse such as party manifestos, political speeches, political campaign debates, social media posts, etc. in order to share methods and approaches of compiling, annotating and exploring parliamentary and other political language data in order to achieve harmonization of the compiled resources, and to ensure current and future comparability of research on national datasets as well as promote transnational analyses.

Topics of interest

Topics include but are not limited to:

  • Creation and annotation of parliamentary data in textual and/or spoken format
  • Annotation standards and best practices for parliamentary corpora
  • Accessibility, querying and visualisation of parliamentary data
  • Text analytics, semantic processing and linking of parliamentary and other datasets of political language data
  • Parliamentary corpora and multilinguality
  • Studies based on parliamentary corpora
  • Studies comparing parliamentary corpora with other types of political discourse

Workshop programme

The format will be slightly different from the originally planned full day workshop with presentations and panels.

Instead, the programme will include a keynote talk and 3 sessions where moderators and authors of accepted papers will bring attention to the most interesting aspects of their work, and discuss the current challenges and solutions and plans for future work. 

  • 14.00 - 14.05: Opening [Slides]
  • 14.05 - 14.25: The keynote talk entitled Different arenas, different texts, one message? What we can learn from a combined analysis of manifestos and parliamentary debates will be presented by Pola Lehmann & Bernhard Weßels, and will be devoted to the Manifesto Project.  [Slides]
  • 14.25 - 14.30: Q & A
  • 14.30 - 15.05: Session 1. Creation of parliamentary corpora [Slides]
    • 14.30 - 14.35: Opening by Session Chair Jan Odijk
    • 14.35 - 15.00: Authors response to the Session Chair questions (5 minutes each)
      • New Developments in the Polish Parliamentary Corpus. Maciej Ogrodniczuk and Bartłomiej Nitoń
      • * Anföranden: Annotated and Augmented Parliamentary Debates from Sweden. Stian Rødven Eide  * Not present at the virtual event.
      • IGC-Parl: Icelandic Corpus of Parliamentary Proceedings. Steinþór Steingrímsson, Starkaður Barkarson and Gunnar Thor Örnólfsson
      • Compiling Czech Parliamentary Stenographic Protocols into a Corpus. Barbora Hladka, Matyáš Kopp and Pavel Straňák
      • Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. Matthew Coole, Paul Rayson and John Mariani
      • The siParl Corpus of Slovene Parliamentary Proceedings. Andrej Pancur and Tomaž Erjavec
    • 15.00 - 15.05: Q & A
  • 15.05 - 15.30: Session 2. Tools for parliamentary corpora [Slides]
    • 15.05 - 15.10: Opening by Session Chair Francesca Frontini
    • 15.10 - 15.25: Authors response to the Session Chair questions (5 minutes each)
      • Who mentions whom? Recognizing political actors in proceedings. Lennart Kerkvliet, Jaap Kamps and Maarten Marx
      • Challenges of Applying Automatic Speech Recognition for Transcribing EU Parliament Committee Meetings: A Pilot Study. Hugo de Vos and Suzan Verberne
      • Parsing Icelandic Alþingi Transcripts: Parliamentary Speeches as a Genre. Kristján Rúnarsson and Einar Freyr Sigurðsson 
    • 15.25 - 15.30: Q & A
  • 15.30 - 16.00: Session 3. Investigations of parliamentary corpora [Slides]
    • 15.30 - 15.35: Opening by Session Chair Laura Morales
    • 15.35 - 15.55: Authors response to the Session Chair questions (5 minutes each)
      • Identifying Parties in Manifestos and Parliament Speeches. Costanza Navarretta and Dorte Haltrup Hansen 
      • Comparing Lexical Usage in Political Discourse across Diachronic Corpora. Klaus Hofmann, Anna Marakasova, Andreas Baumann, Julia Neidhardt and Tanja Wissik
      • The Europeanization of Parliamentary Debates on Migration in Austria, France, Germany, and the Netherlands. Andreas Blaette, Simon Gehlhar and Christoph Leonhardt
      • Querying a large annotated corpus of parliamentary debates. Sascha Diwersy and Giancarlo Luxardo
    • 15.55 - 16.00: Q & A
  • 16.00 - 16.05: Closing remarks [Slides]

Submissions & Publication

We accept submission of long papers (up to 8 pages), short papers (up to 4 pages) and demo papers (up to 4 pages) to be presented as a long or short oral presentation at the workshop. Papers should not be anonymous and should be formatted according to the stylesheet availabile on the LREC 2020 website. The papers of the workshop will be published in online proceedings. 

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Submission page:

Important Dates

  • Extended paper submission deadline: 21 February 2020
  • Notification of acceptance: 13 March 2020
  • Camera-ready paper: 2 April 2020
  • Workshop date: Monday 11 May 2020

Organizing Committee

  • Darja Fišer, University of Ljubljana and Jožef Stefan Institute, Slovenia 
  • Franciska de Jong, CLARIN ERIC, The Netherlands
  • Maria Eskevich, CLARIN ERIC, The Netherlands

The workshop is supported by the CLARIN research infrastructure. To contact the organizers, please mail clarin [at] (clarin[at]clarin[dot]eu) (Subject: [ParlaCLARIN@LREC2020]).

Programme Committee 

in alphabetical order:

  • Kaspar Beelen, The Alan Turing Institute, UK
  • Andreas Blätte, The University of Duisburg-Essen, Germany
  • Francesca Frontini, Université Paul Valéry - Montpellier, France
  • Maria Gavriilidou, ILSP/Athena RC, Greece
  • Henk van den Heuvel, Radboud University, The Netherlands
  • Klaus Illmayer, Austrian Academy of Sciences, Austria
  • Bente Maegaard, CLARIN ERIC, The Netherlands
  • Monica Monachini, National Research Council of Italy, Italy
  • Laura Morales, Sciences Po, France
  • Jan Odijk, Utrecht University, The Netherlands
  • Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, Poland
  • Petya Osenova, IICT-BAS and Sofia University "St. Kl. Ohridski", Bulgaria
  • Maria Pontiki, ILSP/Athena RC, Greece
  • Sara Tonelli, Fondazione Bruno Kessler, Italy
  • Simone Paolo Ponzetto, University of Mannheim, Germany
  • Stelios Piperidis, ILSP/Athena RC, Greece
  • Tamás Váradi, Hungarian Academy of Sciences, Hungary
  • Tanja Wissik, Austrian Academy of Sciences, Austria
  • Tomaž Erjavec, Jožef Stefan Institute, Slovenia

Identify, Describe and Share your LRs!

Describing your LRs in the LRE Map is now standard practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility,  when submitting a paper, to upload LRs in a special LREC repository.  This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN,, a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers  will be offered at submission time.



Palais Pharo