Tour de CLARIN: Norway

Submitted by Jakob Lenardič on 23 December 2019

Blog post written by Koenraad de Smedt and Kristin Hagen, edited by Darja Fišer and Jakob Lenardič

Norway joined CLARIN in 2015 after having been an observer since 2013. The Norwegian national infrastructure for language resources and technology is called CLARINO. Its five-year construction phase was funded through the eponymous national project since 2012 and the infrastructure is currently in full operation. From 2020 the infrastructure will enter a three-year upgrade phase, which is also be funded nationally. 

The Norwegian national coordinator is Prof. Koenraad De Smedt at the University of Bergen. Other institutions that currently comprise the Norwegian CLARIN consortium are the following:

In the upgrade phase, the Norwegian Centre for Research Data (NSD) will join the consortium as well.

CLARINO offers its infrastructure services through four centres.

The CLARINO Bergen Centre at the University of Bergen (UiB), in cooperation with the Norwegian School of Economics (NHH), has been certified as a CLARIN type B centre. The centre is operated by the Department of Linguistic, Literary and Aesthetic studies (LLE) and the University Library (UBB) . It offers a repository based on CLARIN DSpace with LINDAT extensions, where researchers can download and upload digital language datasets. The centre also offers the following online tools: the INESS treebanking platform with treebanks for more than 70 languages, the Corpuscle corpus exploration system providing access to corpora currently covering 18 languages, and the COMEDI metadata editor which is in use worldwide. These systems run on a dedicated high performance computer. The centre will also operate the Terminology Portal after its migration from Oslo.

Språkbanken, the Language Technology Resource Collection for Norwegian at the National Library of Norway (NB), has been certified as a CLARIN type C centre. Its mission is to provide language resources, primarily for Norwegian, that are not only useful to academic researchers but also suitable for research and development in applied language technology. NB also provides an online n-gram counter and exploration tools for its digitised library holdings. NB maintains the CLARINO National Metadata Registry (CNMR) , a national catalogue of language resources based on metadata harvested from all CLARINO nodes.

The Text Laboratory at the University of Oslo (UiO) has been certified as a CLARIN type C centre. Through the centre and the homepage a wide range of corpora are available via the  corpus exploration system Glossa (33 corpora for several different languages, including several African languages). Glossa supports CLARIN Federated Content Search (FCS). The corpus system Glossa itself is also downloadable, together with the Oslo-Bergen Tagger for morphosyntactic analysis and annotation of written Norwegian. A variety of databases and word lists based on frequency and text genre are also offered. Other units at UiO have developed an Interactive Dynamic Presentation (IDP) system for digital editions and the CLARINO Language Analysis Portal providing workflows for language analysis.

TROLLing, the Tromsø Repository of Language and Linguistics at The Arctic University of Norway (UiT), has been certified as a CLARIN type C centre. Based on Dataverse, the repository specialises in packages of replication-enabling datasets bundled with statistical code, presentations and other documentation, including scientific articles. UiT also hosts Giellatekno which has datasets and tools on Sami and other Arctic minority languages.

CLARINO has two certified Knowledge Centres:

  1. The CLARINO Bergen Centre has, together with the CLARIN/LINDAT Centre in Prague, established a K-Centre for treebanking, which has already been featured in Tour de CLARIN – read the introduction and the interview with Helge Dyvik, who is one of the main developers at the Centre.
  2. The Norwegian Centre for Research Data (NSD) operates a K-Centre on data management (including legal issues).

A recent Norwegian Parliamentary Paper on the Humanities describes CLARINO as the common infrastructure for language databases in Norway, having an impact primarily on the Language Sciences, but also enabling substantial research potential in other SSH disciplines, as well as in industrial research and development, for instance through multilingual technologies.

From 2020, the three-year CLARINO+ project will upgrade the technical infrastructure, promote uptake by researchers and further sustainability. As a result, the upgraded CLARINO will be a more up-to-date, dynamic and user-centred infrastructure providing better services for more language data stemming from past, present and future research. It will be better adapted to new state-of-the-art CLARIN core services and standards, and will better serve an extended target audience based on a sustainable business model. An updated portal will give easier access to the distributed services located in the four Norwegian centres and to central CLARIN services.

The Text Laboratory group (Anders Nøklestad, Joel Priestley, Janne Bondi Johannessen and Kristin Hagen) at LREC2016 in Portorož, Slovenia.

Click here to read more about Tour de CLARIN