ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora

Monday, 20 June 2022 , All day

General Information

Title: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora

Date: To be held as part of the 13th edition of the Language Resources and Evaluation Conference (LREC), at the Palais du Pharo, Marseille, France (20–25 June 2022).

Time: Full day on Monday 20 June

Location: Palais du Pharo, Old Palace Level 1, Room: Grand Large (floor map)

Twitter Hashtag: #ParlaCLARINIII

Proceedings | Programme with proceedings |

Workshop Description

Parliamentary data is an important source of scholarly and socially relevant content, serving as a verified communication channel between the elected political representatives and members of the society. The development of accessible, comprehensive and well-annotated parliamentary corpora is therefore crucial for the information society, as such corpora help scientists and investigative journalists to ascertain the accuracy of socio-politically relevant information, and to inform the citizens about the trends and insights on the basis of such data explorations. Research-wise, parliamentary corpora are a quintessential resource for a number of disciplines in digital humanities and social sciences, such as political science, sociology, history, and (socio)linguistics.

The distinguishing characteristic of parliamentary data is that it is spoken language produced in controlled circumstances. Such data has traditionally been transcribed in a formal way but is now also increasingly released in the original audio and video formats, which encourages resource and software development and provides research opportunities related to structuring, synchronisation, visualisation, querying and analysis of parliamentary corpora. Therefore, a harmonised approach to data curation practises for this type of data can support the advancement of the field significantly. One of the ways in which the research community is supported in this line of work is through the conversion of existing corpora and further development of new cross-national parliamentary corpora into a highly comparable, harmonised set of multilingual resources. These allow researchers to share comparative perspectives and to perform multidisciplinary research on parliamentary data. We envision that the ParlaCLARIN III workshop, as a venue for knowledge and experience exchange on the topic, will contribute to the development and growth of the field of digital parliamentary science.

This third ParlaCLARIN workshop is a continuation of the 2018 and 2020 editions held at the respective LREC conferences. On the one hand, it continues to bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the Humanities and Social Sciences. On the other hand, we envisage the appearance of new discussion threads, tasks, and challenges that are partially inspired by or related to the new data releases such as ParlaMint and data formats such as Parla-CLARIN.

We have invited unpublished original work focusing on (but not exclusive to)

Compilation, annotation, visualisation and utilisation of parliamentary records
Harmonisation of existing multilingual parliamentary resources, containing either synchronic or diachronic data or both
Linking or comparing of parliamentary records with other datasets of political discourse such as party manifestos, political speeches, political campaign debates, and social media posts, and to other sources of structured knowledge, such as formal ontologies and LOD datasets (in particular for the description of speakers, political parties, etc.)

Special themes for this year’s workshop are:

Machine translation of parliamentary proceedings and research using machine translated parliamentary data
Semantic tagging of parliamentary proceedings and research using semantically tagged parliamentary data
Compilation, alignment and annotation of multimodal parliamentary resources and research using multimodal parliamentary data

Apart from the dissemination of the results, the workshop also aims to address the identified obstacles, discuss open issues and coordinate future efforts in this increasingly trans-national and cross-disciplinary community.

Topics of Interest

Topics include but are not limited to:

Creation and annotation of parliamentary data in textual and spoken format
Enrichment of parliamentary data with semantic and named entity tagging
Querying and visualisation of parliamentary data
Text mining over parliamentary and other political language data
Harmonization of multilingual parliamentary resources
Adoptions or extensions of the Parla-CLARIN and ParlaMint schema to other parliamentary resources
Comparative studies of parliamentary corpora
Parliamentary corpora as a source of political language
Diachronic studies based on parliamentary corpora
Studies of parliamentary corpora with particular focus on the debates dedicated to global crises, such as COVID pandemic, climate crisis
The potential of parliamentary resources beyond academia

Proceedings

ParlaCLARIN III proceedings can be browsed online at this link or the PDF can be downloaded at this link.

Keynote

Luke Blaxill, University of Oxford

Parliamentary Corpora and Research in Political Science and Political History

This keynote reflects on some of the barriers to digitised parliamentary resources achieving greater impact as research tools in political history and political science. As well as providing a view on researchers’ priorities for resource enhancement, I also argue that one of the main challenges for historians and political scientists is simply establishing how to make best use of these datasets through asking new research questions and through understanding and embracing unfamiliar and controversial methods than enable their analysis. I suggest parliamentary resources should be designed and presented to support pioneers trying to publish in often sceptical and traditional fields.

Programme

9.15 - 9.30	Welcome and introduction
9.30 - 10.30	Session 1: Corpus Creation 1 - Chair: Maria Eskevich ParlaMint II: The show must go on. Maciej Ogrodniczuk, Petya Osenova, Tomaž Erjavec, Darja Fišer, Nikola Ljubešić, Çağrı Çöltekin, Matyáš Kopp and Meden Katja How GermaParl Evolves: Improving Data Quality by Reproducible Corpus Preparation and User Involvement. Andreas Blaette, Julia Rakers and Christoph Leonhardt Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899). Marie Anna Puren, Nicolas Bourgeois, Aurélien Pellet and Pierre Vernus A French Corpus of Québec's Provincial Parliamentary Debates. Pierre André Ménard and Desislava Aleksandrova
10.30 - 11.00	Coffee break
11.00 - 12.00	Keynote by Luke Blaxill, University of Oxford: Parliamentary Corpora and Research in Political Science and Political History. Chair: Franciska de Jong
12.00 - 13.00	Session 2: Corpus Enhancement - Chair: Çağrı Çöltekin Error Correction Environment for the Polish Parliamentary Corpus. Maciej Ogrodniczuk, Michał Rudolf, Beata Wójtowicz and Sonia Janicka Clustering Similar Amendments at the Italian Senate. Tommaso Agnoloni, Carlo Marchetti, Roberto Battistoni and Giuseppe Briotti Entity Linking in the ParlaMint corpus. Ruben van Heusden, Maarten Marx and Jaap Kamps Visualizing Parliamentary Speeches as Networks: the DYLEN Tool. Seung-bin Yim, Katharina Wünsche, Asil Cetin, Julia Neidhardt, Andreas Baumann and Tanja Wissik
13.00 - 14.00	Lunch
14.00 - 15.15	Session 3: Corpus Analysis - Chair Petya Osenova Emotions running high? A synopsis of the state of Turkish politics through the ParlaMint corpus. Gül M. Kurtoğlu Eskişar and Çağrı Çöltekin Immigration in the Manifestos and Parliament Speeches of Danish Left and Right Wing Parties between 2009 and 2020. Costanza Navarretta, Dorte Haltrup Hansen and Bart Jongejan Parliamentary Discourse Research in Sociology: Literature Review. Jure Skubic and Darja Fišer A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics. Christopher Klamm, Ines Rehbein and Simone Paolo Ponzetto Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus. Yves Bestgen
15.15 - 16.00	Panel - Chair: Darja Fišer Panellists Luke Blaxill, ParlaCLARIN III keynote Jure Skubic, on behalf of participants of the Helsinki Digital Humanities Hackathon 2022 Nikola Ljubešić, on behalf of the ParlaMint team
16.00 - 16.30	Coffee break
16.30 - 17.45	Session 4: Corpus Creation 2 - Chair: Maciej Ogrodniczuk Adding the Basque Parliament corpus to ParlaMint project. Jon Alkorta and Mikel Iruskieta Quintian ParlaSpeech-HR - a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Nikola Ljubešić, Danijel Koržinek, Peter Rupnik and Ivo-Pavao Jazbec Making Italian Parliamentary Records Machine-Actionable: the Construction of the ParlaMint-IT corpus. Tommaso Agnoloni, Roberto Bartolini, Francesca Frontini, Simonetta Montemagni, Carlo Marchetti, Valeria Quochi, Manuela Ruisi and Giulia Venturi ParlamentParla: A Speech Corpus of Catalan Parliamentary Sessions. Baybars Kulebi, Carme Armentano-Oller and Carlos Rodriguez-Penagos ParlaMint-RO: Chamber of the Eternal Future. Petru Rebeja, Mădălina Chitez, Roxana Rogobete, Andreea Dincă and Loredana Bercuci
17.45 - 18.00	Pitches of relevant initiatives in the field - Chair:Maciej Ogrodniczuk
	Workshop Dinner

Submission & Publication

We accept submission of long papers (up to 8 pages), short papers (up to 4 pages) and demo papers (up to 4 pages) to be presented as a long or short oral presentation at the workshop. Papers should not be anonymous and should be formatted according to the stylesheet available on the LREC2022 website. The papers of the workshop will be published in online proceedings.

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Submission page

Important Dates

Paper submission deadline: 25 March 2022 (deadline extended!)
Notification of acceptance: 25 April 2022
Early Bird registration deadline: 6 May 2022 (deadline extended!)
Camera-ready paper: 20 May 2022
Workshop date: 20 June 2022

Organising Committee

Darja Fišer, Institute of Contemporary History and University of Ljubljana, Slovenia
Maria Eskevich, CLARIN ERIC, The Netherlands
Jakob Lenardič, University of Ljubljana, Slovenia
Franciska de Jong, CLARIN ERIC, The Netherlands

The workshop is supported by the CLARIN research infrastructure.
To contact the organisers, please mail clarin [at] clarin.eu (Subject: [ParlaCLARIN@LREC2022]).

Programme Committee (in alphabetical order)

Ahlame Bedgouri, Faculty of Sciences and Technology of Fez, University of Sidi Mohamed Ben Abdellah, Morocco
Çağrı Çöltekin, University of Tübingen, Germany
Jesse de Does, Dutch Language Institute, The Netherlands
Tomaž Erjavec, Jožef Stefan Institute, Slovenia
Francesca Frontini, Istituto di Linguistica Computazionale "A. Zampolli", CNR Pisa, Italy
Maria Gavriilidou, ILSP/Athena RC, Greece
Barbora Hladká, Charles University, Czechia
Haidee Kotze, Utrecht University, The Netherlands
Nikola Ljubešić, Jožef Stefan Institute, Slovenia
Bente Maegaard, CST, Department of Nordic Languages and Linguistics, University of Copenhagen
Maarten Marx, University of Amsterdam, The Netherlands
Stefano Menini, Fondazione Bruno Kessler, Trento, Italy
Robert Muthuri, Anjarwalla & Khanna LLP, Kenya
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, Poland
Petya Osenova, IICT-BAS and Sofia University "St. Kl. Ohridski", Bulgaria
Stelios Piperidis, ILSP/Athena RC, Greece
Simone Paolo Ponzetto, Mannheim University, Germany
Paul Rayson, Lancaster University, United Kingdom
Sara Tonelli, Fondazione Bruno Kessler, Italy
Daniela Trotta, University of Salerno, Italy

Identify, Describe and Share your LRs!

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Address

Marseille
France