You are here

Repositories

Storing language resources and related datasets is something that requires a sound organization and attention for digital sustainability. After all, one of the important aims of CLARIN is to ensure that digital language resources are made available to a broad community on a long-term basis. This is achieved by establishing data repositories at the centres, which host digital files and the associated metadata. For reference purposes, these repositories also assign persistent identifiers to the resources, so that e.g. a specific dataset can be easily cited in a paper.

Users can inspect the data at such a repository with a local interface. But the metadata is also shared with the rest of the CLARIN community, by means of metadata harvesting.

Although CLARIN is a strong advocate of open access, in some cases resources have to be password-protected, if only to respect legal, privacy and ethic constraints. Even then, federated login makes it easier to request access to these protected collections and to login once the access has been granted. This decision stays with the resource owner: authorisation, too, is truly distributed in the CLARIN infrastructure.

Repository assessment

The quality, organisational and technical background of the CLARIN repositories is subject to an assessment procedure. Repositories that have successfully undergone this procedure are granted the CLARIN B centre label.

Software and configuration

The Centre Registry shows some URIs that you can use to query the OAI-PMH endpoints run by CLARIN centres. You can query these endpoints by simply clicking on the hyperlinks, or in a more technical fashion, in order to find out f.i. the kind of OAI-PMH server software they operate. To learn more about practical experiences with certain OAI-PMH server software, you are advised to contact endpoint operators (the centre's technical contact should do), in other words, to 'query' persons. They can freely decide which system they want to use, as long as it is compliant with the centre requirements (support for metadata harvesting, component metadata, persistent identifiers, federated login).  Popular options are Fedora Commons and DSpace. Some centres have provided manuals about how to setup a CLARIN-compliant repository and relevant information was also presented at a repository tutorial.