CLARIN Newsflash February 2022

Antal van den Bosch Joins the CLARIN ERIC Board of Directors

We are pleased to announce that Prof Antal van den Bosch will join the CLARIN ERIC Board of Directors ( ) from 1 March 2022. He will take over from Andreas Witt, who was a member of the CLARIN BoD from September 2019 until February 2022 and has now been appointed national coordinator of CLARIN-D.

Antal van den Bosch will be working together with the other BoD members and the executive director Franciska de Jong on the development and implementation of the CLARIN strategy and policy. In particular he will focus on strengthening the ties between CLARIN and the broader AI world.

Read more

Meet the New CLARIN Ambassadors

The CLARIN Ambassadors Programme aims to raise awareness of and encourage participation in disciplines and communities that are not yet fully integrated in CLARIN ERIC.

In consultation with the national coordinators and representatives of the User Involvement Committee, three new experts have been selected for the role of CLARIN Ambassadors from September 2021 for two years: Eva Soroli, Paul Rayson and Satu Saalasti. The new ambassadors cover diverse areas such as corpus linguistics and neurolinguistics, semantic multilingual natural language processing, as well as multimodal communication in clinical populations.

Read more

New Impact Story: IceTaboo Database with Commercial Application

The IceTaboo database is a novel resource for processing offensive words in Icelandic. Designed in close collaboration with an industry partner in commercial software development, the database is intended to be used as part of an automatic proofreading tool. IceTaboo can be used to flag contextually inappropriate words in texts, and is already being used as part of the automatic proofreading software by an Icelandic online news website.

Read more here

CLARIN Resource Families: Tools for Named Entity Recognition

The CLARIN Resource Families provide a user-friendly overview of the available language resources in the CLARIN infrastructure for researchers from the digital humanities, social sciences and human language technologies.

This month, CLARIN highlights: Tools for named entity recognition.

Named entity recognition (NER) is an information extraction task, which identifies mentions of various named entities in unstructured text and classifies them into predetermined categories, such as person names, organisations, locations, date/time, monetary values, etc.

The CLARIN infrastructure offers twenty-four tools for NER. Fifteen tools are aimed at normalising texts within a single language, while the rest have a very broad multilingual scope. While sixteen tools are dedicated exclusively to NER in terms of their functionality, eight are part of tool pipelines that also provide functionalities such as PoS-tagging, lemmatisation and syntactic parsing.

See the overview

BLOGS & REPORTS

New publication: The ParlaMint Corpora of Parliamentary Proceedings

In this new open-access journal publication, Tomaž Erjavec and colleagues present the ParlaMint corpora, which include the transcriptions of sessions of seventeen European national parliaments. The content of parliamentary debates has become increasingly important for research in the social sciences and humanities, and the ParlaMint corpora open up the possibility for transnational analysis on societally relevant topics. The paper outlines the corpora’s compilation, quantitative data, encoding and distribution. ParlaMint corpora have been used in several studies, including in the 2021 Helsinki Digital Humanities Hackathon, where a team focused on the identification of differences and similarities in parliamentary debates around the COVID-19 pandemic in Italy, Poland, Slovenia and Great Britain. The corpora are openly available through CLARIN.SI.

Read the paper

TRAINING AND EDUCATION

New Guide on Text Analysis in Python by Dirk Hovy

Text prediction algorithms are powerful, yet applying these techniques to specific research questions usually requires profound programming expertise. This guide by Dirk Hovy is part of the Elements in Quantitative and Computational Methods for the Social Sciences series. It offers an overview of the most common methods for text classification, their applicability, and Python code to execute them. It covers both the ethical foundations of such work, as well as the emerging potential of neural network methods. A printed version is available through Cambridge University Press. Hovy’s guide is now available for free download until 2 March.

Read more

The Open Handbook of Linguistic Data Management

A new guide on linguistic data management has been published. The guide, The Open Handbook of Linguistic Data Management, offers best practices for the management, archiving, sharing, and citing of linguistic research data. The handbook is accompanied by a free open-access course.

Access the course

Teaching FAIR-Related Skills in Higher Education

FAIRisFAIR has published How to be FAIR with your data. A teaching and training handbook for higher education institutions. The handbook aims to support higher education institutions with the integration of FAIR-related content in their curricula and teaching. The guide provides practical material, such as competence profiles, learning outcomes and lesson plans.

Read the handbook

WATCH THIS!

CLARIN2021 Panel: The Role of Corpora for the Study of Language Use and Mental Health Conditions

In this video, the panellists Gloria Gagliardi, Stefan Goetze, Saturnino Luz and Khiet Truong, moderated by Henk van den Heuvel, discuss infrastructural and strategic issues related to the resources needed for automatic detection of mental health conditions from text and speech. The panel took place on 29 September during the third day of CLARIN2021. For more detailed information and slides, please visit the dedicated web page.

EVENTS & CALLS

Call for Papers: ParlaCLARIN III Workshop at LREC2022

20 June 2022, Marseille, France

The ParlaCLARIN III workshop at LREC2022 will focus on the topic of ‘Creating, Enriching and Using Parliamentary Corpora’. Parliamentary (language) data serves as a communication channel between elected political representatives and members of society, thus reflecting socio-politically relevant information. The development of accessible, comprehensive and well-annotated parliamentary corpora is crucial for a number of disciplines, such as political science, sociology, history, and (socio)linguistics. The workshop will bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the humanities and social sciences.

Deadline for submissions: 15 March 2022

Read more

Registration now open: ENRIITC Your Industry Outreach: Workshop for Social Sciences and Humanities

11 March 2022, online event

CLARIN and DARIAH with support from H2020 project ENRIITC are jointly organising an industry outreach online workshop where leading research infrastructures and networks from the social sciences and humanities are invited to exchange on past and present innovation and collaboration activities with non-academic and/or commercial partners.

The aim of the event is to identify common challenges regarding the skills needed to reach out beyond academia (e.g. more effective communication, how to translate the value of the infrastructure to non-academic and/or commercial entities etc.), as well as a set of common SSH training topics that are relevant for non-academic entities when considering the research infrastructure service offer.

Read more

Registration now open: DHd2022 Conference on ‘Cultures of Digital Memory’

7- 11 March 2022, online event

Cultures are rooted in memory, as well as traditional practices of conservation and transmission. This conference discusses which consequences digitisation might have for these cultural practices and for infrastructures in archives, libraries and museums. Hosted by the University of Potsdam and the Potsdam University of Applied Sciences, the DHd2022 conference programme is predominantly in German, though some sessions will be held in English.

Read more

Call for Abstracts: Language Technologies and Humanities Conference

15 - 16 September 2022, Ljubljana, Slovenia

This conference will bring together researchers from various backgrounds and methodological frameworks. Topics will include speech and language technologies, digital linguistics, and digital humanities in research, education and publishing. The organisers invite submissions on research, good practices and projects in these areas.

Deadline for extended abstracts: 15 May 2022

Read more

Call for Papers: NexusLinguarum Workshop on ‘Discourse Studies and Linguistic Data Science’

24 May 2022, Jerusalem, Israel (hybrid event)

This workshop will cover current research advances in discourse analysis and representation in the context of multilinguality from a linguistic and computational perspective. The organisers invite submissions addressing challenges such as interoperability, linguistic linked open data (LLOD), as well as language processing and analysis. The workshop offers a forum for researchers interested in discussing challenges and advancing practices in discourse studies and linguistic data science.

Deadline for extended abstracts: 20 March 2022

Read more

Call for Papers: NexusLinguarum Workshop on ‘PROfiling LINGuistic KNOWledgE gRaphs (ProLingKNOWER)’

24 May 2022, Jerusalem, Israel (hybrid event)

This workshop will foster multidisciplinary discussion on novel approaches, methodologies and frameworks around profiling Linguistic Linked Data (LLD), such as corpora, lexicons and ontologies. A focus will be on highlighting tools and user interfaces that can effectively assist different use cases for profiling such data. The organisers invite application-oriented papers, as well as more theoretical papers and position papers.

Deadline for papers: 25 March 2022

Read more

Netherlands eScience Center Fellowship Programme

This programme is aimed at those who want to promote and improve the use of research software within their organisation or discipline. To this end, fellows are expected to carry out a project (twelve months, from June 2022 to the end of May 2023), which could include creating a tutorial, an info series or hosting a hackathon. Applicants are welcome to use the fellowship to boost existing initiatives that fit the purpose of the programme.

Deadline for applications: 11 April 2022

Read more

Call for Volumes: Phraseology and Multiword Expressions (PMWE)

This series of volumes highlights conventionalised, idiosyncratic combinations of words. These are sometimes called phrasemes, or multiword expressions (MWEs), and include multiword compounds, idioms, collocations and proverbs. The editors invite contributions from experts from different disciplines, including psycholinguistics, computer science and natural language processing. Proposals should address topics related to theoretical, computational, and empirical approaches to phraseology.

Read more