The UK has been an observer of CLARIN since 2015 and is now almost half-way through its second three-year period. Countries are admitted as observers in order to prepare a proposal for full membership, to build a national consortium, and to develop infrastructure at the national level, and these are the tasks that are currently the focus for CLARIN in the UK. CLARIN-UK is a loose consortium of researchers and providers of language resources and tools, working in tandem with the Arts and Humanities Research Council, part of the recently founded body UK Research and Innovation (UKRI).
The CLARIN-UK consortium currently consists of eleven centres, which have stepped forward to express an interest and commitment to being part of CLARIN. Other centres are welcome to get in touch and to participate in meetings and other activities. The current members of the UK CLARIN consortium are:
- Bodleian Libraries (University of Oxford)
- British Library
- Centre for Corpus Research (University of Birmingham)
- Centre for Translation Studies (University of Leeds)
- Endangered Languages Archive (SOAS University of London)
- Centre for Corpus Approaches to the Social Sciences and UCREL (Lancaster University)
- National Centre for Text Mining (University of Manchester)
- Natural Language Processing Group (University of Sheffield)
- Research Group in Computational Linguistics (University of Wolverhampton)
- School of Critical Studies (University of Glasgow)
- School of Humanities (Coventry University)
The main criteria for inclusion of institutions in CLARIN-UK are that they have a strong relation to digital language data, tools, or research, and that there is a commitment to sharing and connecting data and tools to support research. The particular strengths of digital language research in the UK are reflected in the centres, resources and events featured on the CLARIN-UK website. Some of the most widely used resources are the Spoken BNC2014, the Historical Thesaurus of English, the CliC interface for the works of Dickens and other literary corpora, and the CQPweb concordancer at Lancaster, while prominent CLARIN-UK tools include GATE, which is an open source software that performs a wide range of computational tasks, CLAWS, which is a powerful part-of-speech tagger for English, and Wmatrix, which is a corpus analysis tool that among others provides a web interface for the CLAWS tagger.
The numerous training events include annual summer schools in corpus linguistics and digital humanities, in Birmingham and Lancaster, and training in language documentation offered by SOAS. UK members have co-organized workshops on Oral History, NLP for Historical Texts, Analyzing Social Media in East Asian Studies, and participated in workshops on language resources in teaching, and dealing with GDPR, among others.
The Oxford Text Archive is registered as a CLARIN-certified repository, using the CLARIN single sign-on system and offering language resources to the Virtual Language Observatory. The OTA makes available more than 60,000 digital resources, including the British National Corpus and a wide variety of Old English, Middle English (such as the first printed edition of Geoffrey Chaucer’s famous Canterbury Tales from 1476), and Modern English historical texts digitized as part of Early English Books Online (EEBO).
SOAS University of London, whose Endangered Text Archives are involved in CLARIN UK activities, is part of the CLARIN Knowledge Centre for linguistic diversity and language documentation (CKLD). Such centres in the UK could play an important role as part of CLARIN's distributed infrastructure. The UK is currently developing a national research infrastructure roadmap, and CLARIN-UK is contributing to the consultation about requirements and existing infrastructure services. CLARIN-UK is featured on InfraPortal, the UK's Research and Innovation Infrastructure Portal.
Participation in CLARIN is not dependent on EU membership, so Brexit does not represent a direct threat to our activities and plans. On the contrary, CLARIN, along with other European research infrastructures, offers an excellent opportunity for continued and growing collaboration and cooperation with our European partners.
Martin Wynne, the national coordinator of the UK consortium
Blog post written by Martin Wynne, edited by Darja Fišer and Jakob Lenardič