UPSKILLS Learning and Teaching Materials

UPSKILLS (UPgrading the SKIlls of Linguistics and Language Students), an Erasmus+ partnership project, successfully drew to an end on 31 August 2023. This three-year project, launched in September 2020, sought to identify and tackle the gaps and mismatches in skills for linguistics and language students. To reach this aim, consortium members from eight partner institutions, including CLARIN, worked towards a new curriculum component, which targets developing technical and transferable skills needed for industry and academic research among undergraduate students of language-related subjects. Below, you can find an overview of CLARIN's contribution to the first three intellectual outputs: (1) needs analysis, (2) research-based teaching, and (3) learning content.


Needs Analysis

In the first intellectual output, CLARIN contributed to several surveys, such as a survey of linguistics and language-related degrees in Europe, a survey of business sectors hiring linguists and language professionals, and focus interviews with selected job market stakeholders. In addition, CLARIN helped curate existing open-source educational and training materials, which can be adapted and reused to develop new learning content related to data acquisition skills (text and speech processing) and data handling skills (data standards and repositories).


Best Practices and Guidelines

Research-Based Teaching: Best Practices and Guidelines

These guidelines, developed by the consortium partners from the University of Graz, contain a template for designing research-based (RBT) courses and defining learning outcomes, examples of courses the consortium partners piloted and testimonials, and other valuable resources for teachers (e.g. a research report template, a survey course evaluation by students). CLARIN contributed to formulating the learning outcomes related to using research infrastructures and data collection and archiving techniques.

Integrating Research Infrastructures into Teaching

CLARIN compiled an additional resource to show how teachers and trainers can leverage the CLARIN research infrastructure services (Virtual Language Observatory, CLARIN Resource Families, Federated Content Search and Language Resource Switchboard) to help students enhance their data collection, processing and analysis, and archiving skills. The guide also shows lecturers how to navigate the knowledge infrastructure to find language resources relevant to their students’ projects, and it points to relevant UPSKILLS learning content on Moodle that can be further adapted for classroom use. Furthermore, our partners from the University of Zurich have developed and included an interactive research tracking tool template in this practical guide. The research tracker enables students to track their progress during projects, while teachers can use it to provide feedback at intermediate stages throughout the project.

Intermediate versions of the above guidelines were discussed with lecturers from linguistics and language-related programmes during the third Multiplier Event CLARIN organised on 4 November 2022 in Utrecht, the Netherlands. The discussions with the lecturers concluded that the actual implementation of research-based teaching, including the use of research infrastructures and language resources, depends on the lecturers’ ability to find the balance between teaching fundamental research and more practical skills (e.g. corpus-based pedagogy and data-driven learning using corpora and tools), the students’ background, level of digital literacy, and study load, as well as the flexibility of the curriculum. Additionally, the lecturers acknowledged the benefits of using existing infrastructures, such as the , the Resource Families of open corpora and integrated concordancers (e.g. noSketchEngine, Korp) to create a safe environment for students to experiment and discover new corpora and tools on their own. These benefits were also noticed during the UPSKILLS Summer School for Students of Languages and Linguistics, organised from 15 until 21 July 2023 in Petnica, Serbia. For a full overview of the research-based and infrastructure guidelines, please see the browsable versions on the project website, while the PDF versions can be found on Zenodo. The main outcomes of this event were also summarised in this blog post.


Learning Content

CLARIN developed two learning blocks accessible on Moodle from the project website. These materials can be integrated into any course or programme related to language research and teaching. 

Introduction to Language Data: Standards and Repository  

This learning block, designed by Iulianna van der Lek and Darja Fišer, introduces learners to research data repositories and their role in the linguistic research data lifecycle in the context of open science and FAIR data principles. It consists of 6 units, which count as 6 ECTS if taught as a whole:

  • Unit 1. Introduction to the Language Resource Lifecycle and Management
  • Unit 2. How Research Data Repositories Help Make Language Data FAIR
  • Unit 3. Finding and (Re)using Language Resources in the CLARIN Repositories
  • Unit 4. Citing Language and Linguistic Data
  • Unit 5. Legal and Ethical Issues Language Data Collection, Sharing and Archiving
  • Unit 6. Student Project
  • Glossary.

Teachers can teach and adapt the whole learning block or cherry-pick only those presentations and learning activities that match the learning outcomes of a specific programme, course or student project. Some presentations and assignments can also be used as self-study materials. Most of the learning content in this block has been created in H5P, which can be reused outside the UPSKILLS Moodle platform in any content or learning management system supporting this format. Teachers who do not use Moodle can edit the H5P presentations and activities using the Lumi education app, which is freely available. 

Finally, parts of this learning block were piloted during the second semester of the 2022-2023 academic year in the Corpus Analysis course at Leiden University, the Netherlands, in the Research Methods and Analysis Techniques in Digital Linguistics programme at the the University of Ljubljana, in Slovenia, and in two workshops organised at the MEDAL Summer School in Corpus Linguistics and the Lancaster Corpus Linguistics conference. In the second semester of 2023-2024, parts of this learning block will be used in the linguistic curricula of  the UPSKILLS consortium partners (Belgrade, Bologna, Malta and Rijeka). Finally, via a survey, lecturers from five other universities (KU Leuven, University of Zagreb, Lithuania and Ljubljana) expressed their interest in reusting this learning block in their curricula.

Automatic Speech Recognition and Forced Alignment 

This learning block, designed by Louis ten Bosch and Henk van den Heuvel, amounts to 6 ECTS and introduces the learners to the basics of automatic speech recognition. The learning block consists of 10 modular units, which lecturers can adapt further as required. Each unit contains quizzes, which need to be regularly updated based on the rapid developments in the ASR field.

  1. The speech signal
  2. Acoustic features
  3. Bayes and Viterbi
  4. Architectures of ASR (I)
  5. Architectures of ASR (II)
  6. Forced alignment as a special case of ASR
  7. Data selection criteria/justification
  8. Dialogue
  9. Language models
  10. Student project (3 ECTS).

Furthermore, in collaboration with the consortium partners from the University of Graz, CLARIN co-developed Guidelines for the Students’ Projects and Research Reporting Formats and coordinated the showcasing of the student research projects on the UPSKILLS project website. The student research projects aimed to complement the 11 learning blocks developed by the consortium partners. CLARIN designed two student projects in collaboration with the University of Bologna project partners and included learning outcomes related explicitly to using research data repositories for data collection and archiving. One of the projects, Building, Sharing and Archiving Corpora, was adapted for the level of the students enrolled in the text processing track at the Petnica Summer School in July 2023. The NoSketch Engine concordancer implemented by the CLARIN.SI research infrastructure was used to teach students how to query corpora.

Further, CLARIN co-developed the dissemination plan for the learning content with the consortium partners from the University of Rijeka, Serbia and Malta. As part of this task, CLARIN evaluated the suitability of existing registries and repositories (DARIAH-CAMPUS, DH Course Registry, SSH Open Marketplace, OER platform, CLARIN.SI repository, Zenodo) for disseminating and archiving the UPSKILLS learning content at the end of the project. The learning content has been archived in the CLARIN.SI repository for long-term preservation and an entry has been created in the SSH Open Marketplace for broader dissemination in SSH. Additionally, CLARIN conducted interviews with students and teachers during the Petnica Summer School, which can be viewed on YouTube

UPSKILLS Summer School 2023

Please visit the UPSKILLS website for more information about the final project deliverables. See the full press release about the completion of the project.

For questions regarding the CLARIN infrastructure guide and learning content blocks, please email Iulianna van der Lek at



First of all, we would like to thank Silvia Bernardini, Maja Milicevic Petrovic, Simonovic, Marko, Assimakopoulos, Stavros, Puskas, Genoveva, Francesca Frontini, Pawel Kamocki, Esther Hoorn, Alexander König, Mietta Lennes, Jurgita Vaičenonienė, Tanja Wissik, and Willem Elbers for their valuable contributions to the CLARIN learning content and guidelines. We are also grateful to Carole Tiberius, Satu Saalasti, and Martin Wynne, who piloted parts of the Moodle learning content in their programmes and workshops. Special thanks to Tanja Samardzic and Marie Berthouzoz, who helped build the research tracker prototype. Furthermore, we would like to thank Tomaž Erjavec for helping us archive all the learning content in the CLARIN.SI repository, Karina Berger for proofreading and editing the materials produced in the project, and David Bordon for providing technical support during the UPSKILLS events. And finally, we are grateful to all the lecturers, teachers and students who participated in our UPSKILLS events.