Depositing Services

Depositing services

One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:

  • Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
  • Resources can be cited easily with a persistent identifier.
  • The resources and their metadata will be integrated into the infrastructure, making it possible to search them efficiently.
  • Password-protected resources can be made available via an institutional login.
  • Once resources are integrated in the CLARIN infrastructure, they can be analyzed and enriched more easily with various linguistic tools (e.g. automated part-of-speech tagging, phonetic alignment or audio/video analysis).

The following certified CLARIN centres are offering depositing services:

Centre Location Depositing offer
ACDH-CH

Austria

Any linguistic and/or NLP data and tools

LINDAT-CLARIAH/CZ

Czech Republic

Any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc.

CLARIN-DK-UCPH

Denmark

Danish language resources The focus is on written and spoken language resources. Possible to deposit: text corpora and texts with annotations, imdi-sessions containing audio, video and annotations of these resources, together with lexicons and other data.

CELR Estonia Estonian language resources: texts, corpora, audio and video recordings, lexical data, terminologies, tools for , etc.
FIN-CLARIN Finland All language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland.

BAS

Germany

corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures)

BBAW Germany German corpora or parallel corpora, historical prints and manuscripts (in German), lexical resources (also in German)

HZSK

Germany

mainly spoken language corpora, with a thematic focus on language acquisition, multilingualism, linguistic diversity and language documentation

IDS Germany

resources on the German language

CLARIN:el Greece Greek language resources
ILC4CLARIN Italy
IMS Germany

language resources (e.g. corpora, treebanks, lexical resources) and NLP tools (also language models, web services etc.); special focus on domain adaptation

UdS Germany

multilingual corpora (parallel, comparable) and corpora including specific registers

SFS Germany all language resources
Dutch Language Institute Netherlands Dutch and Belgian Dutch language resources
Meertens Instituut/HuC Netherlands Resources pertaining to Dutch language and culture
The Language Archive Netherlands

all valuable language data, in particular to data related to the languages and cultures of small and endangered speech communities

CLARINO Bergen Centre Norway B + K centre with depositing services for: language datasets and tools for research

CLARIN-PL

Poland

Polish language resources

CLARIN.SI

Slovenia

Any linguistic and/or NLP data and tools

CMU-TalkBank USA TalkBank-compatible corpora

These organisations also offer reliable and largely compatible depositing services:

Centre Location Depositing offer
ORTOLANG France archiving of oral and linguistic data
Language Archive Cologne Germany All audio and audio-visual language resources in particular from endangered and under-resourced languages as well as recordings of oral literature.
Netherlands

all digital research data

Språkbanken Norway C-centre with depositing services for: language datasets for R&D involving Norwegian (Bokmål, Nynorsk) or official minority languages in Norway (Sami, Kven)
TROLLing Norway

The Tromsø Repository of Language and Linguistics (TROLLing), a repository of data, code, and other related materials used in linguistic research.

Oxford Text Archive UK

electronic literary and linguistic resources