The 2018 ParlaCLARIN workshop is held in Miyazaki (Japan), as part of the 11th edition of the Language Resources and Evaluation Conference (LREC2018).

Workshop Description

Parliamentary data is a major source of socially relevant content. It is available in ever larger quantities, is multilingual, has rich metadata, and has the distinguishing characteristic that it is essentially a transcription of spoken language produced in controlled circumstances, which is now increasingly released also in audio and video formats. All those factors in combination require solutions related to its archiving, structuring, synchronization, visualization, querying and analysis. Furthermore, adequate approaches to its exploitation also have to take into account the need of researchers from vastly different Humanities and Social Sciences fields, such as political sciences, sociology, history, and psychology.

An inspiring CLARIN-PLUS cross-disciplinary workshop “Working with parliamentary records” [1] that was held in Sofia, Bulgaria, in Spring 2017, and a comprehensive overview of a multitude of the existing parliamentary resources within the CLARIN infrastructure [2] clearly indicated a need for better harmonization, interoperability and comparability of the resources and tools relevant for the study of parliamentary discussions and decisions, not only in Europe but worldwide.

This workshop aims to bring together researchers interested in compiling, annotating, structuring, linking and visualising parliamentary records that are suitable for research in a wide range of disciplines in the Humanities and Social Sciences. We invite unpublished original work focusing on the collection, analysis and processing of parliamentary records.


Due to Freedom of Information Acts that are supported by the United Nations and set in place in over 100 countries worldwide, parliamentary debates are being increasingly easy to obtain, and have always been of interest to researchers from a wide range fields in Humanities and Social Sciences both for the potential influence of their content, and the specificities of the  formalized, often persuasive and emotional language use in this context. As a consequence, there are many initiatives, on the national and international levels, that aim at compiling and analysing parliamentary data. Recent CLARIN-PLUS survey on parliament data has identified over 20 corpora of parliamentary records, with over half of them being available within the CLARIN infrastructure [3].

Given the maturity, variety, and potential of this type of language data as well as the rich metadata it is complemented with, it is urgent to gather researchers both from the side of those producing parliamentary corpora and making them available, as well as those making use of them for linguistic, historical, political, sociological etc. research in order to share methods and approaches of compiling, annotating and exploring them in order to achieve harmonization of the compiled resources, and to ensure current and future comparability of research on national datasets as well as promote transnational analyses.

Topics of interest

Topics include but are not limited to:

  • Creation and annotation of parliamentary data in textual and/or spoken format
  • Annotation standards and best practices for parliamentary corpora
  • Accessibility, querying and visualisation of parliamentary data
  • Text analytics, semantic processing and linking of parliamentary data
  • Parliamentary corpora and multilinguality
  • Studies based on parliamentary corpora

Proceedings, photo and video recordings

  • The workshop proceedings can be found on the ELRA website: link to proceedings.
    To be cited as: Fišer, D., Eskevich, M. and de Jong, F. (eds). Proceedings of LREC2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora. ELRA, 2018. (ISBN: 978-0-306-40615-7 EAN: 4 003994 155486). Bibtex
  • Recordings of all the presentations can be found on CLARIN VideoLectures channel.
  • Photos from the event can be found on CLARIN official Flickr stream.


9:00 – 9:15

Welcome and introduction [pdf]

CLARIN resources for parliamentary discourse research. Darja Fišer, Jakob Lenardič [pdf]

9:15 - 10:30 Session 1: Creating parliamentary corpora
1.1. SlovParl 2.0: The Collection of Slovene Parliamentary Debates from the Period of Secession. Andrej Pančur, Mojca Šorn, Tomaž Erjavec [pdf]
1.2. Polish Parliamentary Corpus. Maciej Ogrodniczuk [pdf]
1.3. ParlAT beta Corpus of Austrian Parliamentary Record. Tanja Wissik, Hannes Pirker [pdf]
1.4. A Corpus of Grand National Assembly of Turkish Parliament’s Transcripts. Onur Güngör, Mert Tiftikci, Çağıl Sönmez [pdf]
10:30 - 11:00               Coffee break

11:00 - 12:00

Keynote talk 

Applying Multi-Perspective Approaches to the Analysis of Parliamentary Data by Cornelia Ilie [pdf]

12:00 - 13:00 Session 2: Enriching parliamentary corpora
2.1. UKParl: A Semantified and Topically Organized Corpus of Political Speeches. Federico Nanni, Mahmoud Osman, Yi-Ru Cheng, Simone Paolo Ponzetto, Laura Dietz [pdf]
2.2. EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates. Mihaela Vela, Elke Teich and Alina Karakanta [pdf]
2.3. Annotation of the Corpus of the Saeima with Multilingual Standards. Roberts Darģis, Ilze Auziņa, Uldis Bojārs, Pēteris Paikens, Artūrs Znotiņš [pdf]
2.4. A Sentiment-labelled Corpus of Hansard Parliamentary Debate Speeches. Gavin Abercrombie and Riza Batista-Navarro [pdf]
13.00 – 14.00         Lunch break
14.00 - 15:00  Session 3: Parliamentary data in computational social sciences 1
3.1. Automatically Labeled Data Generation for Classification of Reputation Defence Strategies. Nona Naderi and Graeme Hirst [pdf]
3.2. Exploring the Political Agenda of the Greek Parliament Plenary Sessions. Dimitris Gkoumas, Maria Pontiki, Konstantina Papanikolaou, Haris Papageorgiou [pdf]
3.3. Findings from the Hackathon on Understanding Euroscepticism Through the Lens of Textual Data. Federico Nanni, Goran Glavaš, Simone Paolo Ponzetto, Sara Tonelli, Nicolò Conti, Ahmet Aker, Alessio Palmero Aprosio, Arnim Bleier, Benedetta Carlotti, Theresa Gessler, Tim Henrichsen, Dirk Hovy, Christian Kahmann, Mladen Karan, Akitaka Matsuo, Stefano Menini, Dong Nguyen, Andreas Niekler, Lisa Posch, Federico Vegetti, Zeerak Waseem, Tanya Whyte, Nikoleta Yordanova [pdf]
15.00 - 16.00 Panel: Infrastructural Support for Research on Parliamentary Data

Panelists: Jan Odijk [pdf], Andreas Blaette [pdf], Federico Nanni [pdf], Cornelia Ilie

16.00 – 16.30 Coffee break
16.30 - 17.30   Session 4: Parliamentary data in computational social sciences 2
4.1. A Pilot Gender Study of the Danish Parliament Corpus. Dorte Haltrup Hansen, Costanza Navarretta, Lene Offersgaard [pdf]
4.2. The Parliamentary Debates as a Resource for the Textometric Study of the French Political Discourse. Sascha Diwersy, Francesca Frontini, Giancarlo Luxardo [pdf]
4.3. Using Data Packages to Ship Annotated Corpora of Parliamentary Protocols: The GermaParl R Package. Andreas Blaette [pdf]
17.30 - 18:00  Closing remarks
20.00 – 22.00 Workshop dinner

Organizing Committee

  • Darja Fišer, The Faculty of Arts, University of Ljubljana, Slovenia 
  • Franciska de Jong, CLARIN ERIC, The Netherlands
  • Maria Eskevich, CLARIN ERIC, The Netherlands

The workshop is supported by the CLARIN research infrastructure. To contact the organizers, please mail (Subject: [ParlaCLARIN@LREC2018]).

Programme Committee 

in alphabetical order:

  • Darius Amilevičius, Vytautas Magnus University, Lithuania
  • Ilze Auziņa, University of Latvia, Latvia
  • Kaspar Beelen, University of Amsterdam, The Netherlands
  • Andreas Blätte, University of Duisburg-Essen, Germany
  • Anastasia Deligiaouri, Western Macedonia University of Applied Sciences, Greece
  • Griet Depoorter, Dutch Language Institute, Belgium
  • Francesca Frontini, Université Paul Valéry - Montpellier, France
  • Katerina T. Frantzi, University of the Aegean, Greece
  • Maria Gavriilidou, ILSP/Athena RC, Greece
  • Goran Glavaš, University of Mannheim, Germany
  • Barbora Hladka, Charles University, Czech Republic
  • Laura Hollink, Centrum Wiskunde & Informatica, The Netherlands
  • Caspar Jordan, Swedish National Data Service, Sweden
  • Martijn Kleppe, National Library of the Netherlands, The Netherlands
  • Krister Lindén, University of Helsinki, Finland
  • Bente Maegaard, University of Copenhagen, Denmark
  • Maarten Marx, University of Amsterdam, The Netherlands
  • Karlheinz Moerth, Austrian Academy of Sciences, Austria
  • Monica Monachini, National Research Council of Italy, Italy
  • Federico Nanni, University of Mannheim, Germany
  • Jan Odijk, Utrecht University, The Netherlands
  • Petya Osenova, IICT-BAS and Sofia University "St. Kl. Ohridski", Bulgaria
  • Simone Paolo Ponzetto, University of Mannheim, Germany
  • Wim Peters, University of Strathclyde, UK
  • Stelios Piperidis, Athena RC/ILSP, Greece
  • Valeria Quochi, National Research Council of Italy, Italy
  • Ineke Schuurman, KU Leuven, Belgium
  • Inguna Skadiņa, University of Latvia, Latvia
  • Sara Tonelli, Fondazione Bruno Kessler, Italy
  • Jurgita Vaičenonienė, Vytautas Magnus University, Lithuania
  • Tamás Váradi, Hungarian Academy of Sciences, Hungary
  • Tanja Wissik, Austrian Academy of Sciences, Austria
  • Martin Wynne, Bodleian Libraries, University of Oxford, UK

Phoenix Seagaia Resort
Room "Tenju"
Miyazaki, 45