Skip to main content

Depositing Services

Depositing services

One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:

  • Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
  • Resources can be cited easily with a persistent identifier
  • The resources and their metadata will be integrated into the infrastructure, making it possible to search them efficiently
  • Password-protected resources can be made available via an institutional login
  • Once resources are integrated in the CLARIN infrastructure, they can be analysed and enriched more easily with various linguistic tools (e.g. automated part-of-speech tagging, phonetic alignment or audio/video analysis).

The following certified CLARIN centres offer depositing services:

Centre Location Depositing offer
ACDH-CH Austria Any linguistic and/or NLP data and tools
LINDAT-CLARIAH/CZ Czech Republic Any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc.
LINDAT-CLARIAH/CZ Czech Republic Language Resource Inventory: An easy-to-use inventory for language resources (and tools), which allows you to browse through submissions and submit metadata. It differs from other depositing services in that it does not require users to upload (meta)data and that it can be used immediately, without contacting the host first.
CLARIN-DK-UCPH Denmark Danish language resources: The focus is on written and spoken language resources. Possible to deposit: Text corpora and texts with annotations, imdi-sessions containing audio, video and annotations of these resources, together with lexicons and other data.
CELR Estonia Estonian language resources: Texts, corpora, audio and video recordings, lexical data, terminologies, tools for , etc.
ORTOLANG France Archiving of oral and linguistic data.
FIN-CLARIN Finland All language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland.
BAS Germany Corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures).
BBAW Germany German corpora or parallel corpora, historical prints and manuscripts (in German), lexical resources (also in German).
IDS Germany Resources on the German language.
CLARIN:el Greece Greek language resources.
ILC4CLARIN Italy
IMS Germany Language resources (e.g. corpora, treebanks, lexical resources) and NLP tools (also language models, web services etc.); special focus on domain adaptation.
UdS Germany Multilingual corpora (parallel, comparable) and corpora including specific registers.
SFS Germany All language resources.
Dutch Language Institute Netherlands Dutch and Belgian Dutch language resources.
Meertens Instituut/HuC Netherlands Resources pertaining to Dutch language and culture.
The Language Archive Netherlands All valuable language data, in particular to data related to the languages and cultures of small and endangered speech communities.
CLARINO Bergen Centre Norway B + K centre with depositing services for language datasets and tools for research.
CLARIN-PL Poland Polish language resources.
CLARIN.SI Slovenia Any linguistic and/or NLP data and tools.
CMU-TalkBank USA TalkBank-compatible corpora.

These organisations also offer reliable and largely compatible depositing services:

Centre Location Depositing offer
Language Archive Cologne Germany All audio and audio-visual language resources in particular from endangered and under-resourced languages as well as recordings of oral literature.
Netherlands All digital research data.
Språkbanken Norway C-centre with depositing services for language datasets for R&D involving Norwegian (Bokmål, Nynorsk) or official minority languages in Norway (Sami, Kven).
TROLLing Norway The Tromsø Repository of Language and Linguistics (TROLLing), a repository of data, code, and other related materials used in linguistic research.
Oxford Text Archive UK Electronic literary and linguistic resources.