The central role of the Centre Registry

Submitted by Thorsten Trippel on 2 August 2013

The CLARIN Centre registry lists all CLARIN centres together with contact persons, address and level of services. But why is the centre registry so essential in a distributed infrastructure such as CLARIN? In this article I will review a couple of reasons why I believe the Centre Registry to be one of the least visible, but one of the most important bits of the CLARIN infrastructure, because it is the foundation of searching for resources, searching in resources, reporting on resources and maintaining resources. I am not the developer of the Centre Registry, but just a user and I am glad to having it, hence this article is biased.

Centre registry in the process of searching for resources

Resources are often distributed at institutions creating the resources, at locations providing archive and depositing facilities. Often resources cannot be made freely available for redistribution by others. The reason for this can be an unclear legal situation at the time of creation or the reuse in different contexts, resources emerged from work in progress, or just because of restricted use. But the information on the existence of the resource and access information can be made available. To make this information available, it is essential that an application knows about the possible locations where resources are available. The centre registry is a point of access, listing the known resource centres, the places where resources can be found. Though the centre registry does not contain the information on the resources itself, it provides the point of access for applications. These applications use the access point to find information on the resources, the metadata for resources stored at a centre. The centre registry hence provides the OAI-PMH access points for each centre. The OAI-PMH server provides access to the metadata to be used by enabled applications.

Centre registry in the search in resources

In corpus linguistics and many other areas, the analysis of a resource begins with locating specific constructs or words within a resource. With the concept of federated content search it is possible to at least have a full text search for centres offering this functionality for their resources. But for a federated search it is required to actually know where to find endpoints for the federated content search, which address an application can use at a centre to submit the search.

The centre registry does not only provide a list of centres, but also provides the relevant URLs for the search within resources. This is especially relevant in the case where for licensing reasons a resource cannot be downloaded by a user, but where fragments of resources may be provided nevertheless. The centre registry hence needs to provide the URL to the service that can be accessed remotely. Before this information is provided to the services, it is required to test the functionality of the end point, evaluating the adequate handling of the requests. Additionally the service has to be robust enough to provide meaningful responses if the service is temporarily not available.

Dissemination of the CLARIN infrastructure

Another area where I see the Centre Registry as especially central is in sharing the resources and infrastructure of CLARIN with the community. Everybody who would like to build tools and services using the CLARIN infrastructure can use the information from the Centre Registry to find out where and which services are available and supported by the infrastructure. For dissemination of the infrastructure as such the Centre Registry is always a core component: It tells you something about the size and structure of the infrastructure.

The Centre Registry may not be the most fascinating service to users, but for me it is one, if not the core infrastructural component of CLARIN. This also means that the information provided to the Centre Registry needs to be valid, well formed and well maintained, as all other infrastructural services rely on the information contained in the registry.