General Information

- 21-22-23 June 2022: Main Conference
- 20-24-25 June 2022: Workshops & Tutorials
- Palais du Pharo, Marseille, France, (floor map)
About
LREC is the major event on Language Resources (LRs) and Evaluation for Human Language Technologies (HLT). The conference provides an overview of the state-of-the-art regarding LRs and their applications. Participants can exchange information, discuss methodologies, industrial use cases and requirements coming from e-science and e-society, with respect to scientific and technological issues as well as policy and organisational ones.
CLARIN-related activities at LREC 2022
Contributions to the Main Conference
Workshops
ParlaCLARIN III Workshop – organised by CLARIN ERIC
Monday 20 June, from 9:00 to 13:00 and from 14:00 to 18:00
The ParlaCLARIN III workshop at LREC2022 will focus on the topic of ‘Creating, Enriching and Using Parliamentary Corpora’. Parliamentary (language) data serves as a communication channel between elected political representatives and members of society, thus reflecting socio-politically relevant information. The development of accessible, comprehensive and well-annotated parliamentary corpora is crucial for a number of disciplines, such as political science, sociology, history, and (socio)linguistics. The workshop will bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the humanities and social sciences.
LEGAL 2022: Legal and Ethical Workshop – co-organised by Ingo Siegert, Khalid Choukri, Mickaël Rigault, Paweł Kamocki, Andreas Witt, Krister Lindén
Friday 24, from 9:00 to 13:00 and from 14:00 to 18:00
Deep learning technologies for language resources and the demand for high-quality data interactions have increased the need for data collections, which are largely subject to legal constraints. Legal frameworks continuously need to adapt to the advancements in technology, while also taking into consideration the interests of stakeholders. This workshop invites technology and legal experts to discuss current legal and ethical issues concerning human language technology.
SIGUL 2022 Workshop – organised by CLARIN-IT
Friday 24 Saturday 25, from 14:00 to 18:00
The first annual meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022) will take place as part of the LREC2022 conference. The workshop will provide academic and industry researchers with a forum for networking, as well as discussing and presenting cutting-edge research in the sector of natural language processing for under-resourced languages. In the tradition of the CCURL-SLTU Workshop Series, SIGUL 2022 spans the research interest areas of less-resourced, under-resourced, endangered, minority and minoritised languages.
The 4th Financial Narrative Processing Workshop (FNP 2022) –co-organised, among others, by CLARIN ambassador Paul Rayson
Friday 24 June, from 9:00 to 13:00 and from 14:00 to 18:00
Oral and Poster Presentations
11:40- 13:00 (Poster Area 1)
Session P1: Language Resource Infrastructures and Policy issues. Chair: Labropoulou, Penny
|
Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project (Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini) |
15:15 - 16:35 (Auditorium)
Session O5: Language Resource Policies and Management.
Chair: Di Persio, Denise, Co-Chair: Frontini, Francesca
|
Ethical Issues in Language Resources and Language Technology – A Tentative Categorisation (Paweł Kamocki and Andreas Witt) |
16:55 - 18:15 (Poster Area 1)
Session P12: Evaluation and Validation Methodologies (1)
Chair: Refaee, Eshrag Ali A.
|
The Subject Annotations of the Danish Parliament Corpus (2009-2017) - Evaluation with Automatic Multi-label Classification. (Costanza Navarretta and Dorte Haltrup Hansen) |
16:55 - 18:15 (Poster Area 1)
Session: P10 - Lexicons (1)
Chair: Olsen, Sussi |
Making a Semantic Event-type Ontology Multilingual
Zdenka Uresova, Karolina Zaczynska, Peter Bourgonje, Eva Fučíková, Georg Rehm, Jan Hajic Charles University, German Research Center for Artificial Intelligence, 3Morningsun Technology, DFKI NomVallex: A Valency Lexicon of Czech Nouns and Adjectives
Veronika Kolářová, and Anna Vernerová, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University |
9:30 - 10:50 (Poster Area 2)
Session P14: Corpora and Annotation (2)
Chair: Ogrodniczuk, Maciej |
|
11:10 - 12:30 (Poster Area 1)
Session P18: Corpora and Annotation (3)
Chair: Montemagni, Simonetta |
Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus (Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir) |
15:15 - 16:35 (Poster Area 2)
Session P22: Lexicons (2)
Chair: Yildiz, Olcay Taner |
Constructing a Lexical Resource of Russian Derivational Morphology (Lukáš Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky and Zdeněk Žabokrtský) |
15:15 - 16:35 (Poster Area 2)
Session P26: Dialogue and Conversational Systems (2)
Chair: Hartholt, Arno |
ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech (Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal and Ondřej Bojar) |
15:15 - 16:35 (Poster Area 2)
Session: P24 - Evaluation and Validation Methodologies (2)
Chair: Zeldes, Amir |
Quality and Efficiency of Manual Annotation: Pre-annotation Bias Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajic Charles University, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics |
16:55 - 18:35 (Poster Area 1)
Session P27: Corpora and Annotation (4)
Chair: Pęzik, Piotr
|
The Bulgarian Event Corpus: Overview and Initial NER Experiments (Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova) |
9:30 - 10:50 (Salle 120)
Session O31: Document Classification, Text Categorisation
Chair: Volk, Martin
Co-Chair: Zhang, Mike |
HeLI-OTS, Off-the-shelf Language Identifier for Text (Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén) |
9:50 - 10:10 (Salle 92)
Session O32: Lexicon and WordNet
Chair: Vossen, Piek Co-Chair: Frontini, Francesca |
Towards the Construction of a WordNet for Old English (Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O'Loughlin, William Michael Short and Sander Stolk) |
15:15 - 16:35 (Poster Area 2)
Session: P38 Less-Resourced Languages (2)
Chair: Soroa, Aitor |
Latvian National Corpora Collection – Korpuss.lv. (Baiba Saulite, Roberts Darģis, Normunds Gruzitis, Ilze Auzina, Kristīne Levāne-Petrova, Lauma Pretkalniņa, Laura Rituma, Peteris Paikens, Arturs Znotins, Laine Strankale, Kristīne Pokratniece, Ilmārs Poikāns, Guntis Barzdins, Inguna Skadiņa, Anda Baklāne and Valdis Saulespurēns) |
15:35 - 15:55 (Salle 120)
Session O37: Anaphora and Coreference
Chair: Magnini, Bernardo Co-Chair: De Bruyne, Luna |
CorefUD 1.0: Coreference Meets Universal Dependencies (Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes and Daniel Zeman) |
Session: R2 - Corpora and Annotation |
Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support. (António Branco, João Silva, Luís Gomes, João Rodrigues) |
Contributions to Co-Allocated Events
Oral and Poster Presentations at Co-allocated Workshops
- Immigration in the Manifestos and Parliament Speeches of Danish Left and Right Wing Parties between 2009 and 2020 (Costanza Navarretta, Dorte Haltrup Hansen and Bart Jongejan; Accepted at ParlaCLARIN III)
- What if Ground Truth is Subjective? Personalized Deep Neural Hate Speech Detection (Kamil Kanclerz, Marcin Gruza, Konrad Karanowski, Julita Bielaniewicz, Piotr Milkowski, Jan Kocon and Przemyslaw Kazienko; Accepted at NLP Perspective workshop)
- StudEmo: A Non-aggregated Review Dataset for Personalized Emotion Recognition (Anh Ngo, Agri Candri, Teddy Ferdinan, Jan Kocon and Wojciech Korczynski; Accepted at NLP Perspective workshop)
- Advantages of a complex multilayer annotation scheme: The case of the Prague Dependency Treebank. (Eva Hajičová, Marie Mikulová, Jiří Mírovský, Barbora Štěpánková; accepted at LAW workshop)
- 9:30–9:50 Extending the SSJ Universal Dependencies Treebank for Slovenian: Was it Worth it? (Kaja Dobrovoljc and Nikola Ljubešić; Accepted at LAW XVI The 16th Linguistic Annotation Workshop)
- 11:40 - 12:40 Advantages of a complex multilayer annotation scheme: The case of the Prague Dependency Treebank (Eva Hajicova, Marie Mikulová, Barbora Štěpánková and Jiří Mírovský; Accepted at LAW XVI The 16th Linguistic Annotation Workshop)
CLARIN Booth at LREC2022
Tuesday 21 | Wednesday 22 | Thursday 23 | |
Morning coffee break |
11:20 - 11:40
Members of CLARIN ERIC Board of Directors
|
11:50 - 11:10 Kaja Dobrovoljc Dedicated to the paper ‘Spoken Language Treebanks in Universal Dependencies: an Overview’ (Kaja Dobrovoljc) Petya Osenova, Kiril Simov Dedicated to the paper ‘The Bulgarian Event Corpus: Overview and Initial NER Experiments' (Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova) |
Fahad Khan Dedicated to the paper |
13:00 - 14:30 Lunch
|
Francesca Frontini, Monica Monachini
Dedicated to the paper ‘Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project’ (Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini)
|
Starkaður Barkarson Dedicated to the paper ‘Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus' (Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir) |
Tommi Jauhiainen
Dedicated to the paper
HeLI-OTS, Off-the-shelf Language Identifier for Text
Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén University of Helsinki |
Afternoon coffee break | Paweł Kamocki
Dedicated to the paper ‘Ethical Issues in Language Resources and Language Technology – A Tentative Categorisation’ (Paweł Kamocki and Andreas Witt) |

Marseille
France