2022 was another big year for CLARIN!
Not only did we celebrate CLARIN’s 10th anniversary as a research infrastructure, and publish a book about the achievements after a decade of work, but the community was able to come together once again in person as well as virtually at the hybrid edition of CLARIN’s Annual Conference in Prague. Over the course of the year, many CLARIN Cafés were held, research was showcased in the Impact Stories, CLARIN training materials made their way to new user groups, and CLARIN launched its first promotional video.
On this year’s holiday card, we proudly present some of these highlights with six hidden links, including a final message from Franciska de Jong, looking back at her time as Executive Director. As of 1 January 2023, Darja Fišer will be the new Executive Director of CLARIN .
Thank you for a wonderful year – happy holidays and best wishes for 2023!
Switzerland Is Joining CLARIN as Observer
We are pleased to announce that Switzerland will join CLARIN as an observer from 1 January 2023, with a commitment for three years. The Swiss consortium CLARIN-CH was already founded in December 2020 and consists of a growing number of Swiss academic institutions.
CLARIN-CH is hosted by the Zurich Center for Linguistics. Its operational governance is ensured by Prof. Dr. Marianne Hundt (UZH) as National Coordinator, Prof. Dr. Anita Auer (UNIL) as President of the Consortium and Dr. Cristina Grisot (UZH) as Scientific Coordinator. The representing entity will be the State Secretary for Education, Research and Innovation (SERI), with Katharina Eggenberger appointed to represent Switzerland as observer at the General Assembly meetings.
New Impact Story: Voices of Ravensbrück: Multilingual Oral History
This impact story showcases the Voices from Ravensbrück project, which has produced a curated set of multilingual oral history interviews with survivors from the Ravensbrück concentration camp for women. The new ‘Ravensbrück Oral History Resource Family’ comprises 38 audio interviews from different countries and presents a unique opportunity to study and compare these historical sources. The compelling topic and the way in which these interviews are presented makes them attractive and easy to work with, opening up many avenues for cross-disciplinary and multilingual research in a broad variety of fields. In addition, it represents a valuable resource for schools, higher education, the media, and society more widely.
CLARIN Resource Families: Sign Languages
CLARIN’s Resource Families (CRF) are designed to facilitate comparative research. During the 2020 workshop for all CLARIN Knowledge Centres (K-centres) in 2020, all K-centres working with Sign Language resources were able to get together to explore possible ways of collaboration. With the aim of making Sign Language resources more visible, findable and accessible for users across the globe, a collaborative effort of four K-centres was to create a Resource Family portal and inventorising existing resources that pertain to Sign Language. The results of this collaboration are now visible on CLARIN’s CRF page as a separate Family for Sign Language Resources. In total, this page connects scholars to 51 corpora in 24 different sign languages, as well as 24 lexical resources in 10 different sign languages.
UPDATES FROM THE NATIONAL CONSORTIA
Open Event of CLaDA-BG: National Research Infrastructure for Bulgarian Language and Cultural Heritage Resources integrated within CLARIN and DARIAH
Historians, philologists, ethnologist, specialists in informatics and 3D technologies, as well as other partners of the CLaDA-BG consortium, gathered together for the Open event of CLaDA-BG on 16 November 2022 and presented the results of their work in the infrastructure for the past four years of the preparation phase. This work unites the domains of DARIAH and CLARIN - history, culture and language resources - and the presented results can benefit researchers from diverse backgrounds, as well as teachers and students, and are also available for the general public.
A total of 18 presentations were given, and several discussion sessions and interactive poster sessions took place. The topics were divided intо three sessions - Demo Session: Presentation of Software and Services, Session for Technologies and Resources of CLaDA-BG and Session for Resources: Collections, E-books, Artefacts. The programme ended with a discussion on the role of infrastructures in the field of humanities and social sciences, as well as on the advantages of open science as the science of the future.
A recording of the event can be found here (in Bulgarian).
Gijsbert Rutten Takes Over from Jan Odijk as National Coordinator for the Netherlands
As of this fall, Gijsbert Rutten (Leiden University) has taken up the role as CLARIN National Coordinator for the Netherlands. Gijsbert, who is professor in historical sociolinguistics, is also coordinating the CLARIN work package in the Dutch nationally funded CLARIAH-NL.
Both the CLARIAH-NL project and the CLARIN network are saying goodbye to Jan Odijk, a very familiar face to both communities. Jan was a long-time Dutch National Coordinator and an active member of the CLARIN community from the start. He was also instrumental in obtaining national funding for the CLARIAH-NL projects (2013-2023), and for the CLARIN-NL project carried out in cooperation with CLARIN Flanders (2009-2014). He combined the directorship role of these projects and the National Coordinator role with a professorship at Utrecht University, where, among other topics, he worked on grammatical formalisms and structures and retrieving grammatical information and knowledge from treebanks. Jan has retired from this position; his retirement will be marked with a farewell lecture at Utrecht University on 30 January 2023.
BLOGS & REPORTS
A Recap on the CLARIN Café on Text and Data Mining Exceptions a Year After - Has the Pony Become a Horse?
The CLARIN Café on Text and Data Mining (TDM) Exceptions a Year After took place on 8 November 2022 and was organised by the CLARIN Legal and Ethical Issues Committee (CLIC). Around 25 participants attended, including language researchers, lawyers and legal experts from both CLARIN institutions and the private sector. The aim of the event was to discuss the impact of the copyright exceptions for TDM introduced by the recent Directive on Copyright in the Digital Single Market on language research and technology so far. The event featured presentations from distinguished guest speakers: Thomas Margoni, research professor of intellectual property law at KU Leuven, and Toby Bond, data lawyer and partner at Bird & Bird London, as well as Antal van den Bosch (Utrecht University, and CLARIN Board of Directors) and Jan Hajic (Charles University Prague).
CLARIN Compatible NER and Geoparsing Web Services for Italian and Serbian Parallel Text (It-Sr-Ner)
Named entity recognition (NER) is essential for many applications, such as identifying clients in business transcripts, determining location in social media posts, anonymising sensitive documents, and automatically classifying electronic media articles and topics. Prof Olja Perisic’s It-Sr-Ner project was funded as part of CLARIN’s call scheme. The aim was to address the lack of tools and resources available for the annotation, exploration and analysis of bilingual aligned Italian-Serbian texts, and to connect CLARIN to external language technology tools by building web services that would enable such NER annotation of aligned text.
In the future, Olja Perisic will work on increasing corpus size, improvement of performances of the model for NER for Serbian, improvement of performances of the recognition of geolocations and connecting of entities with the knowledge base. In the university context, the services developed will be integrated into the teaching of Italian at the University of Belgrade and Serbian as a foreign language at the University of Torino.
TRAINING AND EDUCATION
Training Resource: Topic Modelling Parliamentary Debates Before and During the COVID-19 Pandemic
This training resource was developed by Ajda Pretnar Žagar in collaboration with Kristina Pahor de Maiti and Darja Fišer. It introduces basic text mining concepts to digital humanities beginners by applying the Latent Dirichlet Allocation (LDA) topic modelling to a specific use case. Trainees will learn to independently perform topic modelling on new data, understand the pitfalls of topic modelling and know when to apply the method. The materials include the links for independent work (workflows, data, software references). All the procedures used in the tutorial are language-agnostic, so no additional changes need to be made for non-English corpora.
UPSKILLS Blog: Guidelines and Best Practices for Research-Based Teaching
One of the main goals of the UPSKILLS project is to compile and publish guidelines and best practices that teaching staff can easily follow when they decide to integrate their own research into their teaching. During the UPSKILLS Multiplier event, organised by CLARIN ERIC on 4 November in Utrecht, an overview of the guidelines and best practices for Research-Based Teaching (RBT) were presented, which contain a template describing the structure of a comprehensive RBT course and 16 examples of RBT courses on different topics, ranging from the acquisition of English as a second language, to automatic speech recognition and multilingualism. The guidelines will be complemented by a short guide for teachers about using the CLARIN research infrastructure in teaching and a collection of best practices for integrating industry-based research projects into the curriculum.
Andreas Witt and Darja Fišer on ‘CLARIN. The Infrastructure for Language Resources’
De Gruyter Acquisitions Editor Svetoslava Antonova Baumann interviews Andreas Witt and Darja Fišer about the motivation for ‘CLARIN. The Infrastructure for Language Resources’, the use of language data outside of linguistics, as well as the importance of the diversity of languages and language resources for the digital humanities.
EVENTS & CALLS
Call for Papers: LIBER 2023
5 - 7 July 2023, Budapest, Hungary
Research libraries facilitate flows of scientific information and encourage individuals and groups to find their own way to data and information. The theme of the LIBER 2023 conference is ‘Open and Trusted – Reassessing Research Library Values’, relating to the central position of research libraries as trusted hubs that reliably connect communities. Critical responsibilities, such as engagement, adaptation, and sustainability, enable libraries to set up spaces, services and collections that can benefit their communities. Among others, the organisers welcome submissions on the topics of Open Science, building sustainable infrastructures, and data management.
Deadline for submissions: 8 January 2023
Call for Papers: DH Benelux 2023
31 May - 2 June 2023, Brussels, Belgium
The annual DH Benelux conference serves as a platform for the community of interdisciplinary Digital Humanities researchers to meet, present and discuss their latest research findings and to demonstrate tools and projects. The central theme is ‘Crossing Borders: Digital Humanities Research across Languages and Modalities’. The call is open to scholars in the arts and humanities, the (social) sciences, and the heritage sector, as well as developers and computer scientists with an interest in the application and use of digital technologies. Of particular interest are contributions that consider multilingualism and/or integrated processing of sources in different forms, such as images, maps, sounds, texts and datasets.
Deadline for submissions: 31 January 2023
Call for Papers: SwissText 2023
12 - 14 June 2023, Neuchâtel, Switzerland
SwissText is an annual conference for text analytics and natural language processing ( ) experts from industry and academia. SwissText features demonstrations of available NLP solutions, as well as research results, projects, surveying works that describe new concepts, innovative research, systems, and standards in the areas of NLP. The organisers (SwissNLP and partners) welcome submissions from the applied track and the scientific research track, which bring together experts from industry and academia.
Deadline for submissions: 15 March 2023
Call for Abstracts: Programming and Data Infrastructure in Digital Humanities
27 - 29 March 2023, Évora, Portugal (virtual event)
Programming in Digital Humanities (DH) is one of the most challenging topics in this field of knowledge and research. A significant number of scholars coming from areas of knowledge such as literature, philology, or history, experience great difficulties when it comes to using programming languages to handle their data and display it. This online conference aims to address these challenges and presents the latest developments associated with programming in DH by bringing together scientists from different areas.
Deadline for submissions: 20 March 2023
Save the Date: International Conference on CMC and Social Media Corpora for the Humanities
14 - 15 September 2023, Mannheim, Germany
This conference is the 10th anniversary edition of an annual conference series dedicated to the collection, annotation, and exploration of corpora of computer-mediated communication (CMC) and social media for research in the humanities. The conference brings together language-centred research on CMC and social media in linguistics, philology, media and communication sciences, and social sciences with research questions from the fields of corpus and computational linguistics, language technology, text technology, and machine learning.
Postdoctoral Researcher in Computational Linguistics
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Corpus linguists have long been studying recurring patterns in digitised texts with the help of concordances. However, lacking a well-established methodology, the art of reading concordances has not yet realised its full potential. This project is a collaboration between the University of Birmingham and Friedrich-Alexander-Universität Erlangen-Nürnberg, and proposes an innovative approach to reading concordances. Using two case studies on English and German data sets, the applicant will work on an approach that provides innovation in corpus linguistics, but also has wider implications for the analysis of textual data at scale, while still retaining a humanities perspective.
Deadline for application: 16 December 2022
PhD in Digital Humanities Research
King’s College London, UK
A fully funded PhD position is now available on the project ‘Lost for Words’: Semantic Search in the Find Case Law Service of The National Archives’, a Collaborative Doctoral Award received by King’s College London in collaboration with The National Archives and funded by the London Arts & Humanities Partnership (LAHP). This interdisciplinary project is an exciting opportunity to work in natural language processing (particularly computational semantics and information retrieval) applied to legal texts and digital humanities. It will study how individuals without legal training use language to navigate court judgments and it will develop tools to facilitate this navigation. Recent advances in natural language processing unlock new possibilities for querying documents via state-of-the-art semantic search, which is crucial for democratising access to digital collections, helping to expose the social impact of how the law is written.
Deadline for application: 27 January 2023