You are here

Depositing Services

One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:

  • Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
  • Resources can be cited easily with a persistent identifier.
  • The resources and their metadata will be integrated into the infrastructure, making it possibe to search them efficiently.
  • Password-protected resources can be made available via an institutional login.
  • Once resources are integrated in the CLARIN infrastructure, they can be analyzed and enriched more easily with various linguistic tools (e.g. automated part-of-speech tagging, phonetic alignment or audio/video analysis).

The following certified CLARIN centres are offering depositing services:

Centre Location Depositing offer
CLARIN Centre Vienna

Austria

Any linguistic and/or NLP data and tools

LINDAT-Clarin

Czech Republic

Any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc.

CLARIN-DK-UCPH

Denmark

Danish language resources The focus is on written and spoken language resources. Possible to deposit: text corpora and texts with annotations, imdi-sessions containing audio, video and annotations of these resources, together with lexicons and other data.

BAS

Germany

corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures)

BBAW Germany German corpora or parallel corpora, historical prints and manuscripts (in German), lexical resources (also in German)
FIN-CLARIN Finland All language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland.

HZSK

Germany

mainly spoken language corpora, with a thematic focus on language acquisition, multilingualism, linguistic diversity and language documentation

IDS Germany

resources on the German language

IMS Germany

language resources (e.g. corpora, treebanks, lexical resources) and NLP tools (also language models, web services etc.); special focus on domain adaptation

UdS Germany

multilingual corpora (parallel, comparable) and corpora including specific registers

SFS Germany all language resources
Dutch Language Institute Netherlands Dutch and Belgian Dutch language resources
Meertens Instituut/HuC Netherlands Resources pertaining to Dutch language and culture
The Language Archive Netherlands

all valuable language data, in particular to data related to the languages and cultures of small and endangered speech communities

CLARINO Bergen Centre Norway B + K centre with depositing services for: language datasets and tools for research
Språkbanken Norway C-centre with depositing services for: language datasets for R&D involving Norwegian (Bokmål, Nynorsk) or official minority languages in Norway (Sami, Kven)

CLARIN-PL

Poland

Polish language resources

CLARIN.SI

Slovenia

Any linguistic and/or NLP data and tools

CMU-TalkBank USA TalkBank-compatible corpora

These organisations also offer reliable and largely compatible depositing services:

Centre Location Depositing offer
CELR Estonia

Estonian language resources: texts, corpora, audio and video recordings, lexical data, terminologies, tools for NLP, etc.

SLDR France archiving of oral and linguistic data
CLARIN:el Greece Greek language resources
DANS Netherlands

all digital research data

Oxford Text Archive UK

electronic literary and linguistic resources