UPSKILLS (UPgrading the SKIlls of Linguistics and Language Students), an Erasmus+ partnership project, successfully drew to an end on 31 August 2023. This three-year project, launched in September 2020, sought to identify and tackle the gaps and mismatches in skills for linguistics and language students. To reach this aim, consortium members from eight partner institutions, including CLARIN, worked towards a new curriculum component, which targets developing technical and transferable skills needed for industry and academic research among undergraduate students of language-related subjects. Below, you can find an overview of CLARIN's contribution to the first three intellectual outputs: (1) needs analysis, (2) research-based teaching, and (3) learning content.
In the first intellectual output, CLARIN contributed to several surveys, such as a survey of linguistics and language-related degrees in Europe, a survey of business sectors hiring linguists and language professionals, and focus interviews with selected job market stakeholders. In addition, CLARIN helped curate existing open-source educational and training materials, which can be adapted and reused to develop new learning content related to data acquisition skills (text and speech processing) and data handling skills (data standards and repositories).
Best Practices and Guidelines
These guidelines, developed by the consortium partners from the University of Graz, contain a template for designing research-based (RBT) courses and defining learning outcomes, examples of courses the consortium partners piloted and testimonials, and other valuable resources for teachers (e.g. a research report template, a survey course evaluation by students). CLARIN contributed to formulating the learning outcomes related to using research infrastructures and data collection and archiving techniques.
CLARIN compiled an additional resource to show how teachers and trainers can leverage the CLARIN research infrastructure services (Virtual Language Observatory, CLARIN Resource Families, Federated Content Search and Language Resource Switchboard) to help students enhance their data collection, processing and analysis, and archiving skills. The guide also shows lecturers how to navigate the knowledge infrastructure to find language resources relevant to their students’ projects, and it points to relevant UPSKILLS learning content on Moodle that can be further adapted for classroom use. Furthermore, our partners from the University of Zurich have developed and included an interactive research tracking tool template in this practical guide. The research tracker enables students to track their progress during projects, while teachers can use it to provide feedback at intermediate stages throughout the project.
Intermediate versions of the above guidelines were discussed with lecturers from linguistics and language-related programmes during the third Multiplier Event CLARIN organised on 4 November 2022 in Utrecht, the Netherlands. The discussions with the lecturers concluded that the actual implementation of research-based teaching, including the use of research infrastructures and language resources, depends on the lecturers’ ability to find the balance between teaching fundamental research and more practical skills (e.g. corpus-based pedagogy and data-driven learning using corpora and tools), the students’ background, level of digital literacy, and study load, as well as the flexibility of the curriculum. Additionally, the lecturers acknowledged the benefits of using existing infrastructures, such as the , the Resource Families of open corpora and integrated concordancers (e.g. noSketchEngine, Korp) to create a safe environment for students to experiment and discover new corpora and tools on their own. These benefits were also noticed during the UPSKILLS Summer School for Students of Languages and Linguistics, organised from 15 until 21 July 2023 in Petnica, Serbia. For a full overview of the research-based and infrastructure guidelines, please see the browsable versions on the project website, while the PDF versions can be found on Zenodo. The main outcomes of this event were also summarised in this blog post.
CLARIN developed two learning blocks accessible on Moodle from the project website. These materials can be integrated into any course or programme related to language research and teaching.
This learning block, designed by Iulianna van der Lek and Darja Fišer, introduces learners to research data repositories and their role in the linguistic research data lifecycle in the context of open science and FAIR data principles. It consists of 6 units, which count as 6 ECTS if taught as a whole:
- Unit 1. Introduction to the Language Resource Lifecycle and Management
- Unit 2. How Research Data Repositories Help Make Language Data FAIR
- Unit 3. Finding and (Re)using Language Resources in the CLARIN Repositories
- Unit 4. Citing Language and Linguistic Data
- Unit 5. Legal and Ethical Issues Language Data Collection, Sharing and Archiving
- Unit 6. Student Project
Teachers can teach and adapt the whole learning block or cherry-pick only those presentations and learning activities that match the learning outcomes of a specific programme, course or student project. Some presentations and assignments can also be used as self-study materials. Most of the learning content in this block has been created in H5P, which can be reused outside the UPSKILLS Moodle platform in any content or learning management system supporting this format. Teachers who do not use Moodle can edit the H5P presentations and activities using the Lumi education app, which is freely available.
Finally, parts of this learning block were piloted during the second semester of the 2022-2023 academic year in the Corpus Analysis course at Leiden University, the Netherlands, in the Research Methods and Analysis Techniques in Digital Linguistics programme at the the University of Ljubljana, in Slovenia, and in two workshops organised at the MEDAL Summer School in Corpus Linguistics and the Lancaster Corpus Linguistics conference. In the second semester of 2023-2024, parts of this learning block will be used in the linguistic curricula of the UPSKILLS consortium partners (Belgrade, Bologna, Malta and Rijeka). Finally, via a survey, lecturers from five other universities (KU Leuven, University of Zagreb, Lithuania and Ljubljana) expressed their interest in reusting this learning block in their curricula.
This learning block, designed by Louis ten Bosch and Henk van den Heuvel, amounts to 6 ECTS and introduces the learners to the basics of automatic speech recognition. The learning block consists of 10 modular units, which lecturers can adapt further as required. Each unit contains quizzes, which need to be regularly updated based on the rapid developments in the ASR field.
- The speech signal
- Acoustic features
- Bayes and Viterbi
- Architectures of ASR (I)
- Architectures of ASR (II)
- Forced alignment as a special case of ASR
- Data selection criteria/justification
- Language models
- Student project (3 ECTS).
Furthermore, in collaboration with the consortium partners from the University of Graz, CLARIN co-developed Guidelines for the Students’ Projects and Research Reporting Formats and coordinated the showcasing of the student research projects on the UPSKILLS project website. The student research projects aimed to complement the 11 learning blocks developed by the consortium partners. CLARIN designed two student projects in collaboration with the University of Bologna project partners and included learning outcomes related explicitly to using research data repositories for data collection and archiving. One of the projects, Building, Sharing and Archiving Corpora, was adapted for the level of the students enrolled in the text processing track at the Petnica Summer School in July 2023. The NoSketch Engine concordancer implemented by the CLARIN.SI research infrastructure was used to teach students how to query corpora.
Further, CLARIN co-developed the dissemination plan for the learning content with the consortium partners from the University of Rijeka, Serbia and Malta. As part of this task, CLARIN evaluated the suitability of existing registries and repositories (DARIAH-CAMPUS, DH Course Registry, SSH Open Marketplace, OER platform, CLARIN.SI repository, Zenodo) for disseminating and archiving the UPSKILLS learning content at the end of the project. The learning content has been archived in the CLARIN.SI repository for long-term preservation and an entry has been created in the SSH Open Marketplace for broader dissemination in SSH. Additionally, CLARIN conducted interviews with students and teachers during the Petnica Summer School, which can be viewed on YouTube.
For questions regarding the CLARIN infrastructure guide and learning content blocks, please email Iulianna van der Lek at email@example.com.