CLARIN Newsflash: February 2023

CLARIN2023

Call for Nominations: Steven Krauwer Awards 2023

Annually, the Steven Krauwer Awards are given in recognition of outstanding contributions towards CLARIN goals in the following areas: language resource building, tool or service development, exemplary use cases, user involvement, knowledge sharing and socio economic impact. This year, awards will again be awarded in the categories of CLARIN Early Career Researcher and CLARIN Achievement.

Deadline for nominations: 28 April 2023

Read more

Call for Abstracts

The call for extended abstracts for the CLARIN Annual Conference 2023 (CLARIN2023) remains open until 14 April 2023.

CKL2CORPORA Certified as New CLARIN Knowledge Centre for Learner Corpora

We are proud to announce that the CKL2CORPORA centre at the Institute of Language and Communication at UCLouvain (Belgium) has been officially recognised as a CLARIN Knowledge Centre (K-centre).

The CLARIN Knowledge Centre for Learner Corpora offers expert knowledge on the collection and use of learner corpora (i.e. electronic collections of language data, in written or spoken form, produced by second or foreign language learners) for theoretical and applied purposes. Sharing of expertise can take various forms, from answering (theoretical, methodological, technical) questions sent via the helpdesk to sharing resources and providing training services.

Visit the CKL2CORPORA K-centre

New Impact Story: Networks of Power – Gender Analysis in European Parliaments

Using the ParlaMint dataset, this project examines different aspects of power in three European parliaments, with a particular focus on gender distribution in the debates. The project began at the Helsinki Digital Humanities Hackathon 2022 (DHH22), which took place in May 2022. After the hackathon ended, five members of the interdisciplinary team - Larissa Leiminger, Jure Skubic, Alexandra Bruncrona, Jan Angermeier and Bojan Evkoski - continued to work together on the topic and recently published their findings in the paper ‘Networks of Power: Gender Analysis in Selected European Parliaments’.

Read the full Impact Story

Two CLARIN B-Centres Certified: BAS and CLARIN-LV

We are pleased to announce that two B-centres have been successfully assessed and have received a B-centre certificate: Bavarian Archive for Speech Signals (BAS) and the CLARIN Centre of Latvian language resources and tools (CLARIN-LV).

The Bavarian Archive for Speech Signals (BAS) has been a CLARIN B-centre since 2013 and was re-certified in late 2022. BAS makes speech resources from contemporary spoken German, as well as tools for the processing of digitised speech available to research and speech technology communities..

CLARIN-LV has recently received its Core Trust Seal certification, a prerequisite to becoming a CLARIN B-centre. The CLARIN-LV repository was set up in March 2020 and focuses on collecting, documenting, curating and providing easy and sustainable long-term access to the digital Latvian language data and tools for a wide group of users.

Read more

Tour de CLARIN: Austria

This month, we showcase CLARIAH-AT. Last featured in 2017, the consortium provides an update about its members and latest strategy and work plan, and highlights a noteworthy tool for sentiment analysis, the Austrian Media Corpus, and a Twitter showcase event. In the interview, researcher Amelie Dorn illustrates how the Austrian Media Corpus is used in the large-scale project ‘German in Austria’.

Read more

UPDATES FROM THE NATIONAL CONSORTIA

CLARIN-DK Introduces New Eye-Tracking Corpus

The Copenhagen Corpus of Eye-Tracking Recordings from Natural Reading (CopCo) is an eye-tracking corpus tailored to both psycholinguistics and natural language processing. The corpus enables researchers to investigate the reading behaviour among different population groups when presented with Danish texts. The corpus contains data of the eye movements of native Danish speakers, both with and without dyslexia, as well as a set of participants whose native language is not Danish. The corpus contains 32 texts, 36888 characters and 1943 sentences. The project has been approved by the Ethics Commission at the Faculty of Humanities at the University of Copenhagen.

Read more

CLARIN-CH Supports Swiss Projects in Linguistics

After having joined CLARIN as observer in January 2023, Switzerland has begun investing in Open Research Data (ORD) programmes and implementing national working groups (WGs).

Three projects in linguistics supported by the CLARIN-CH consortium were approved by the Swiss Open Research Data (ORD) Strategy Council. These projects will address issues such as data-sharing skills in corpus-based research, ORD practices for applied sciences, and upgrading the linguistic ORD-ecosystem. The projects will take place within the framework of a dedicated CLARIN-CH working group.

CLARIN-CH WGs are groups of researchers (i.e. CLARIN-CH members, other national and European scholars) who are interested in topics related to language, resources and infrastructure. They offer the opportunity of working together with peers in a formalised and sustainable environment. They are supported by the CLARIN-CH consortium and work in a strategic area that is defined by the CLARIN-CH scientific community.

Read more

TRAINING AND EDUCATION

Training Resource: Introduction to Speech Analysis

This course, developed by Mietta Lennes (FIN-CLARIN), offers a general introduction to speech corpora management and the methods that are available for the acoustic-phonetic study of speech. During the course, students use the speech analysis program Praat and learn to apply its main features in their own work with speech recordings. Students also learn the basics of ELAN, a program that can be used for transcribing and annotating audio as well as video material.

The contents of the course, which is split into six lessons, are offered in Finnish and English. In addition, the materials can be used for self-study and completed independently, and can also be embedded into other courses.

Read more

UPSKILLS Learning Resources

By exploring the outputs of projects, catalogues by European research consortia, public platforms that host open educational resources and web searches, UPSKILLS has compiled a list of 276 available training resources for academics. The list is categorised into three broad topics (research skills, data acquisition skills and data handling skills) and two cross-cutting components (linguistic theory and research, and data management). The list can be browsed online or downloaded in various formats and includes a search function.

Read more

Survey: Digital Humanities Courses for University Student Exchange

The Digital Humanities in Higher Education Working Group has created a survey with the aim of producing a list of potential courses related to the digital humanities for international student exchanges (bachelor-level and up). With the input, the working group will work towards enriching the learning opportunities for all our students, promote local teaching initiatives, and foster new levels of cooperation in DH education.

Deadline for participation: 28 February 2023

Take the Survey

UPSKILLS Multiplier Event: Linguists in Tech: Closing the Skills Gap

21- 22 April 2023, Geneva, Switzerland

This two-day event brings together members of academia and industry across Switzerland to work on improving the employability of language and linguistics students. After identifying the gaps in the educational system, the UPSKILLS project created an open source repository of didactic materials designed to bridge the mismatch between what the job market needs and what is currently offered to language and linguistics students. The event will present the needs analysis that led to the design of the didactic materials. The event will involve several social moments to facilitate informal exchanges between guests from industry and academia.

Read more

VIDEOS & PODCASTS

CLARIN2022 Keynote: Barbara Plank – Is Human Label Variation Really so Bad for AI?

In her keynote at CLARIN2022, Barbara Plank (LMU Munich) challenges the fact that human variation in labelling is typically considered noise. Annotation projects in computer vision and natural language processing usually try to minimise human label variation in order to maximise data quality and in turn optimise and maximise machine learning metrics. However, is human variation in labelling really noise, or could it be turned into signal for machine learning? Plank illustrates the problem and then discusses approaches to tackle this fundamental issue.

Data Streams: The Research Data Alliance Podcast

Data Streams is a collection of conversations between members of the community about the challenges they face as researchers and data experts in managing the massive quantities of research data, as well as the process of finding solutions and illustrating the value of open research data sharing and reuse.

Read more

EVENTS & CALLS

CLARIN Café: Exploring the Potential of Digital Tools for Online Learning

3 March 2023, 14:00-16:00 (CET), virtual event

This CLARIN Café offers a crash course for academics, researchers and teaching assistants who are interested in learning how to conceptualise and design their own online training material on a topic of their choice. By the end of this course, participants will be able to identify different implementation forms of eLearning, categorise digital tools for learning online, describe the basic steps of planning and design for online learning, and choose the right online learning methods and tools based on specific educational goals and contexts. Participant numbers are limited, so register soon!

Read more

Call for Papers: International CLaDA-BG Conference

10 - 12 May 2023, Sofia, Bulgaria

CLaDA-BG is the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EU. The topic of this year’s conference is ‘Language Technologies and Digital Humanities: Resources and Applications’. Modelling and linking of various types of knowledge and its contexts is crucial for the successful interdisciplinary research related to language, culture and history. The CLaDA-BG conference brings together developers, linguists, digital humanitarians, and others interested in knowledge modelling and linking data for research. The organisers invite submissions on topics including the role of digital libraries, archives and museums, best practices in knowledge modelling and linking, and resources and technologies for historical texts, parliamentary records and speech and multimodal corpora.

Deadline for submissions: 24 February 2023

Read more

Workshop on Profiling Second Language Vocabulary and Grammar

20 - 21 April 2023, Gothenburg, Sweden

This workshop, organised by Språkbanken Text and HumInfra, focuses on tools and resources relevant to research on profiling second language vocabulary and grammar. The creation of electronic resources helps to analyse learner language, which in turn feeds into research and development projects on topics such as teaching and didactics, second language acquisition (SLA) and assessment, lexicography, course book writing and various applied products, including learning apps. These resources, called second language profiles, cover descriptions of vocabulary, grammar and morphology, which learners acquire at different levels of proficiency. Since such resources are rare and new to users and researchers, this workshop provides a forum where resources can be discussed, practically tested and explored. The organisers invite submissions for presentations, discussions, demos and hands-on sessions.

Deadline for registration and submissions: 10 March 2023

Read more

Fifth Summer Datathon on Linguistic Linked Open Data (SD-LLOD-23)

11 - 16 June 2023, Castle Lužnica, Croatia

The SD-LLOD datathon provides practical knowledge to people from industry and academia in the application of Linked Open Data (LOD) technology to linguistics and language technology. The goal of the datathon is to enable participants to migrate their own (or others’) linguistic data and publish them as linked data online and/or develop applications on top of Linguistic Linked Data (LLD). One of the main focus points of this year’s datathon will be the use of deep learning and neural approaches to/from LLD. At least 10 travelling grants for students, covering accommodation, meals and travel expenses, will be provided by the Nexus Linguarum COST Action.

Read more

Call for Submissions: Second International Workshop on Profiling Linguistic Knowledge Graphs

12-15 September 2023, Vienna, Austria (co-located with the LDK Conference 2023)

The focus of this workshop is to showcase novel approaches, methodologies and frameworks on profiling Linguistic Linked Data (LLD, such as corpora, lexicons and ontologies), as well as to highlight tools and user interfaces that can effectively assist different use cases for profiling such data. In addition, the workshop seeks methodologies that help effective profiling in building real-world linked data applications leveraging linguistic data, as well as use cases that show success stories or aspects that have been neglected so far. The workshop will be a forum for researchers and practitioners to come together and discuss common challenges, and to identify synergies for joint initiatives. The organisers welcome contributions describing technical approaches, as well as use cases.

Deadline for submissions: 19 May 2023

Read more