CLARIN Newsflash: May 2023


CLARIN Annual Conference (CLARIN2023)

16 - 18 October 2023, Leuven, Belgium (hybrid event)

Call for Submissions: Using CLARIN in Training and Education

The CLARIN Annual Conference 2023 will dedicate a session to Training and Education to showcase initiatives and best practices in the SSH community for integrating CLARIN’s services, resources and tools into teaching and training. Via this call, teachers, trainers and curriculum designers are invited to submit a 500-word abstract describing their initiative, project, course or training material, clearly specifying how they used the CLARIN infrastructure to provide their target group with the skills required to interact with the infrastructure’s services, resources and/or tools. We also welcome submissions illustrating the adoption or adaptation of existing training materials published on our Learning Hub for a specific course or training, either in a university context or in a continuing professional education setting.

All selected relevant submissions will be presented at the dedicated session during the CLARIN Annual Conference and showcased on the CLARIN website. 

Deadline for submissions: 15 July 2023

Read more

Vacancy at CLARIN ERIC: Member of the Board of Directors

As of September 2023, CLARIN will have an opening for the position of member of the Board of Directors (20% FTE) who will work closely together with the other directors. The appointment will be for a term of two years, with the possibility of prolongation for another term of two years. 

Deadline for applications: 30 June 2023

Read more

New Impact Story: Gender in Poland’s Presidential Election Campaigns

At a time when Poland is grappling with a conservative push and growing polarisation among its population, political scientists Agata Włodkowska and Joanna Gajda reconstructed and analysed how the notions of sex and gender featured in the 2015 and 2020 Polish presidential election campaigns. Using the CLARIN-PL tools Korpusomat and ComCorp, the researchers analysed the language used by the presidential candidates, as well as the presence of gender-related topics in their election agendas, showing the impact that vocabulary and dominant themes can have during the election process itself, as well as their importance for gender equality more broadly.

Read the full impact story here 




Results of the Slovene DSDE Project Now Available

The recently concluded project ‘Development of Slovene in a Digital Environment’ aimed to significantly extend the range of computational tools, services, and resources in the field of language technologies for Slovene, to be used by research organisations, companies, and the general public. The 64 language resources produced in the scope of the project have been made openly (CC BY-SA) available in the CLARIN.SI repository, and include a large speech database, various specialised corpora, machine learning training datasets, computer models for Slovene text annotation, and more. The tools developed in the project are also available as open source, in the scope of the CLARIN.SI GitHub virtual organisation, which now has more than 100 software repositories.

Read more


Best Student Paper at LCT’23 Awarded to Bojan Evkowski

Bojan Evkoski (Institute of Contemporary History in Ljubljana, Slovenia), received the Best Student Paper award for his paper ‘XAI in Computational Linguistics: Understanding Political Orientations in the Slovenian Parliament’ at LTC’23 in Poznan, Poland. The paper is based on CLARIN’s ParlaMint corpus. Specifically, the work focuses on classical machine learning and transformer language models to predict the left-or right-leaning of parliamentarians based on their given speeches on the topic of migrants. 

Read the paper


Impressions from the First SSH Open Cluster Assembly

The SSH Research Infrastructures organised a Cluster Assembly on 24 April 2023, which marked an important milestone since the end of the SSHOC project in April 2022. The event attracted 70 participants from funding agencies, research institutions, project partners, and all landmarks and projects from the group Social and Cultural Innovation. The main message of the meeting was to reinforce the importance of collaboration and the need for continued efforts to ensure the sustainability of the SSH Open Marketplace and the SSH network. Participants were re-invited to join officially by signing the Memorandum of Understanding. 

The assembly was successful in bringing together a diverse group of participants to discuss the future of the SSHOC and its impact on the research community. The next SSH Open Cluster Assembly will be held in October 2023.

Rea d more


Training Resource of the Month: ‘Introduction to Digital Humanities’

This course, developed by Zuzana Nevěřilová, is an introduction to digital humanities and to various aspects of digital content processing. The practical aims consist of introducing current data sources, annotation, pre-processing methods, software tools for data analysis and visualisation, and evaluation methods. The course consists of 10 lessons with video and PowerPoint material. Every lesson contains a practical session – either a Jupyter Notebook to work with in Python, or a text file with a short description of the task. Most of the practical tasks consist of running the programme and analysing the results. Although the course does not focus on programming, the code can easily be reused in individual projects.

The course is aimed at humanities students at beginner level. Though not required, some experience in running Python code is desirable.

Read more

Event: ‘Learn by Playing? Upskilling Linguistics and Languages HE Students with the Aid of Educational Games’

30 May 2023, Valletta, Malta

The aim of the UPSKILLS project is to tackle the gaps in skills in existing higher education programmes and to develop supporting teaching and learning materials, with the ultimate goal of better preparing linguistics and language students for today’s jobs in the language industry. In this one-day multiplier event, the organisers present the creation and implementation of educational games in tertiary education linguistics and language-related courses.

The first part of the programme will introduce the audience to the UPSKILLS project, and its guidelines and best practices for research-based teaching. The second part will zoom in on the use of games within its remit. The event targets educators, lecturers and curriculum designers, but students and other stakeholders interested in educational games and/or languages and linguistics are also very welcome to join. 

Read more


UPSKILLS Learning Resources

By exploring the outputs of projects, catalogues by European research consortia, public platforms that host open educational resources and web searches, UPSKILLS has compiled a list of available training resources for academics. The list is categorised into three broad topics (research skills, data acquisition skills and data handling skills) and two cross-cutting components (linguistic theory and research, and data management). The list can be browsed online or downloaded in various formats and includes a search function.

Read more


Meet CLARIN Ambassador Paul Rayson

The CLARIN Ambassadors Programme encourages participation in CLARIN ERIC in disciplines and communities that are not yet fully integrated in CLARIN. CLARIN ambassador Paul Rayson is a Professor of Natural Language Processing at Lancaster University, UK and Director of the UCREL interdisciplinary research centre which carries out research in corpus linguistics and natural language processing ( ). A long-term focus of his work is semantic multilingual NLP in extreme circumstances where language is noisy e.g. in historical, learner, speech, email, txt and other CMC varieties. Along with domain experts, he has applied his research in the areas of dementia detection, mental health, online child protection, cyber security, learner dictionaries, and text mining of biomedical literature, historical corpora, and financial narratives. He was a co-investigator of the five-year ESRC Centre for Corpus Approaches to Social Science (CASS), which is designed to bring the corpus approach to bear on a range of social sciences. He is also a member of the multidisciplinary Institute Security Lancaster, the Lancaster Digital Humanities Hub, and the Data Science Institute.



CLARIN Café: A New CLARIN Resource Family for Lexical Semantic Change Research

5 July 2023, 14:00-16:00 (CEST), virtual event

The field of lexical semantic change (LSC) has attracted growing interest recently. However, the available resources and tools to conduct this type of research are currently scattered across different CLARIN Resource Families (CRF) or are not even included in any CRF at all. In this CLARIN Café we will present our groundwork for the creation of a new CRF for LSC research. As our preliminary work mainly focuses on historical languages, and Latin in particular, we would like to solicit contributions from the CLARIN community to cover further languages. The event will be accompanied by a tutorial document, explaining how the components of the CRF can be used for LSC research for the CLARIN website. This event is suitable for the entire CLARIN community, and especially researchers in the field of diachronic semantics, annotation of word senses, and the construction of resources and tools for research on lexical semantics and LSC.

Read more

MEDAL Summer School

19 - 23 June 2023, Tartu, Estonia

The MEDAL summer school aims to expose post-graduate students and early career researchers to top-notch research and teaching concerning modern, cutting-edge methodology in corpus linguistics. Each of the first four days of the summer school will include a plenary lecture and parallel workshops given by the plenary speakers and other MEDAL affiliated colleagues, including CLARIN ambassador Satu Saalasti. In addition, participants will have the opportunity to meet one-on-one with instructors to discuss their own specific research topics. PhD students are also invited to create a poster presenting their research, as well as summarise their poster in a 3-minute flash talk. On the fifth day, a methodology session will be held in the morning and a social event in the afternoon on one of the year's most important holidays in Estonia, Midsummer.

Read more



27 June 2023, Brussels, Belgium

META-FORUM is the international conference series on powerful and innovative language technologies for the multilingual information society. META-FORUM is a place to learn about the most recent developments and achievements in European language technology and language-centric AI. In this year’s edition, participants can learn more about the Strategic Research and Innovation Agenda (SRIA) towards Digital Language Equality for a multilingual Europe and initiatives such as the Language Data Space (LDS) and the emerging Language EDIC.

Read more

EUDAT Summer School 2023

26-30 June 2023, Kajaani, Finland

The summer school, organised in collaboration with the DICE project, aims to strengthen the skills required to excel in data management throughout the full research-data lifecycle, including data discovery, data processing, data analysis, data preservation and publishing. Experienced trainers will guide the students, combining traditional theoretical lessons with hands-on sessions.
The summer school will offer two tracks. The Providers Track is aimed at research or academic institution IT system administrators as well as community or data managers looking to provide research data services for their communities. The Users Track will provide early-career or Bachelor, Masters, PhD or Postdoc programme students currently engaged in complex research environments with the latest best practices in research data management.

Read more

Tutorial at DH2023: ‘Put Them In to Get Them Out: the ParlaMint Corpora for Digital Humanities and Social Sciences Research’

11 July 2023, Graz, Austria

This hands-on half-day tutorial aims to explore the potential of CLARIN’s ParlaMint corpora – openly available collections of parliamentary records, which are uniformly sampled, annotated and rich in individual speaker and institutional group metadata. ParlaMint facilitates research into specific European parliaments and allows for transnational comparisons.

Read more



10th Conference on Scholarly Communication in the Context of Open Science (PUBMET)

13 - 15 September 2023, Zadar, Croatia

PUBMET2023 focuses on the plurality of approaches to scholarly communication, scholarly publishing and assessment. With a pre-conference day on 13 September 2023 scheduled for workshops, the conference will present innovative approaches, best practice discussions and take on future challenges. The conference encompasses two main themes: the first is directed towards scholarly communication and publishing as its most visible channel (PUB), and the second towards metrics and assessment (MET).

Read more


Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023

7 - 8 September 2023, Varna, Bulgaria

Equality, Diversity and Inclusion (EDI) are important factors in every field throughout the world. Today’s internet community uses language technology (LT), so it is important to build LT that ensures equality, diversity and inclusion. Recent results have shown that big data and deep learning are entrenching existing biases. The workshop provides a platform where researchers can focus on creating LT that is more inclusive of all genders, racial and sexual orientations, and persons with disability. The workshop focuses on creating speech and language technology to address EDI not only in English, but also in less resourced languages. 

Deadline for submissions: 1 July 2023

Read more




Call for Papers: Austrian Meeting on Digital Linguistics (ÖLT2023)

8 - 10 December 2023, Graz, Austria

Digital linguistics is a growing interdisciplinary field at the intersection of traditional linguistics, information technology, and social sciences. A central focus of digital linguistics is language data, including social media content, parliamentary transcripts, newspapers and medieval manuscripts. Such data is processed, annotated, analysed, curated, shared, archived, and reused. Therefore, the topics covered in this field span from the creation of digital language resources (corpora, dictionaries, etc.) and their analysis, to the use of standards and research infrastructures, as well as methods for long-term archiving or reuse of language data. The aim of this workshop is to highlight recent developments in the research landscape in Austria and to connect different projects from the Austrian research community that work with or on methods in digital linguistics, as well as the researchers involved. The workshop will facilitate the exchange of methodological insights and the creation of synergies through the mutual sharing of digital language resources, also within the framework of CLARIAH-AT.

Deadline for submissions: 10 September 2023

Read more

Call for Papers LREC-COLING 2024

20-25 May 2024, Turin, Italy (hybrid event)

The ELRA Language Resources Association and the International Committee on Computational Linguistics (ICCL) will jointly organise the 2024 International Conference on Computational Linguistics, Language Resources and Evaluation, or LREC-COLING 2024.

Organised as a hybrid event, the conference brings together researchers and practitioners in computational linguistics, speech, multimodality, and natural language processing, with special attention on the evaluation and development of resources that support work in these areas. 

The organisers invite the submission of long and short papers featuring substantial, original, and unpublished research in all aspects of natural language and computation, language resources and evaluation, including spoken and sign language and multimodal interaction. Submissions are invited in five broad categories: theories, algorithms, and models; NLP applications; language resources; NLP evaluation, and topics of general interest. Submissions that span multiple categories are particularly welcome.

Deadline for submissions: 13 October 2023

Read more


PhD Position in Natural Language Processing: Verbal Expression of Quantities

Utrecht University, Utrecht, The Netherlands

Vagueness is the pervasive phenomenon in which words have imprecisely defined boundaries, which are applied differently, depending on the context and user. For example, when a patient report describes a baby’s blood pressure as ‘too high’, or someone’s condition as ‘stable’, these terms are interpreted differently by different clinicians. This PhD project involves the examination of mathematical and computational models of uncertainty and vagueness. In recent years, the literature in this area has shifted away from 2-valued towards multi-valued models, based on modern versions of Zadeh-style fuzzy set theory, Gardenfors-style conceptual spaces, or probabilistic models. However, these models have rarely been tested with real data. This PhD project will use existing ‘big’ datasets to find out which models best predict and explain the data. 

This PhD position is one of six inter-connected PhD positions focusing on uncertainty in Natural Language Processing (NLP), as part of Utrecht University’s AiNed project ‘Dealing with Meaning Variation in NLP’, led by Professor Massimo Poesio.

Deadline for applications: 29 May 2023

Read more 

Project-Funded PhD Position(s) in Natural Language Processing

University of Gothenburg, Gothenburg, Sweden

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the presence of personal and sensitive information, such as names or political opinions. GDPR suggests pseudonymisation as a solution, but more needs to be known about this before adopting it for the manipulation of research data. The Grandma Karl research environment targets several aspects of pseudonymisation, aiming to advance Sweden’s work on open access to research data. The primary focus of this position will be the development of algorithms to automatically detect, label and pseudonymise personal identifiers in freely written texts (essays/blogs), focusing on linguistic challenges such as spelling errors, ambiguous entities and semantic constraints.

Deadline for applications: 27 June 2023

Read more

Call for Applications: RESILIENCE Transnational Access Fellowships

The European Research Infrastructure for Religious Studies, RESILIENCE, has launched its second call for applications for Transnational Access (TNA) Fellowships. Scholars and researchers from across Europe are invited to apply for a research fellowship, to visit the special collections, archives, and libraries of RESILIENCE’s TNA hosts. The TNA fellowships can address the need of scholars to have direct, fast, and effective access to research objects, which are located in different countries.

Deadline for applications: 1 July 2023


Visiting Fellowship Program 

Institute of Contemporary History, Ljubljana, Slovenia

The Institute of Contemporary History is the central scientific institution in Slovenia conducting humanities, contemporary history, and digital humanities research, with an emphasis on historiography. It is the national coordinating institution for the national digital infrastructure for arts and humanities DARIAH-SI, and also a member of CLARIN ERIC’s Slovene national node CLARIN.SI. The fields of research currently conducted at the institute include digital humanities, as well as contemporary political, cultural, economic, social, and ecological history. 

The institute invites applications from scholars working in the field of contemporary political, social and economic history, as well as related disciplines and scholars working in the field of digital humanities. The duration of the fellowship can be two weeks, one month, or two months, and must take place between 1 October 2023 and 1 October 2024.

Deadline for applications: 1 July 2023

Read more