
One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:
- Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
- Resources can be cited easily with a persistent identifier.
- The resources and their metadata will be integrated into the infrastructure, making it possible to search them efficiently.
- Password-protected resources can be made available via an institutional login.
- Once resources are integrated in the CLARIN infrastructure, they can be analyzed and enriched more easily with various linguistic tools (e.g. automated part-of-speech tagging, phonetic alignment or audio/video analysis).
The following certified CLARIN centres are offering depositing services:
Centre | Location | Depositing offer |
---|---|---|
ACDH-CH |
Austria |
|
LINDAT-CLARIAH/CZ |
Czech Republic |
Any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc. |
CLARIN-DK-UCPH |
Denmark |
Danish language resources The focus is on written and spoken language resources. Possible to deposit: text corpora and texts with annotations, imdi-sessions containing audio, video and annotations of these resources, together with lexicons and other data. |
CELR | Estonia | Estonian language resources: texts, corpora, audio and video recordings, lexical data, terminologies, tools for , etc. |
FIN-CLARIN | Finland | All language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland. |
BAS |
Germany |
corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures) |
BBAW | Germany | German corpora or parallel corpora, historical prints and manuscripts (in German), lexical resources (also in German) |
HZSK |
Germany |
mainly spoken language corpora, with a thematic focus on language acquisition, multilingualism, linguistic diversity and language documentation |
IDS | Germany | |
CLARIN:el | Greece | Greek language resources |
ILC4CLARIN | Italy |
B-centre with depositing services for language datasets and tools for research, especially for Italian and classical languages
|
IMS | Germany |
language resources (e.g. corpora, treebanks, lexical resources) and NLP tools (also language models, web services etc.); special focus on domain adaptation |
UdS | Germany |
multilingual corpora (parallel, comparable) and corpora including specific registers |
SFS | Germany | all language resources |
Dutch Language Institute | Netherlands | Dutch and Belgian Dutch language resources |
Meertens Instituut/HuC | Netherlands | Resources pertaining to Dutch language and culture |
The Language Archive | Netherlands |
all valuable language data, in particular to data related to the languages and cultures of small and endangered speech communities |
CLARINO Bergen Centre | Norway | B + K centre with depositing services for: language datasets and tools for research |
CLARIN-PL |
Poland |
|
CLARIN.SI |
Slovenia | |
CMU-TalkBank | USA | TalkBank-compatible corpora |
These organisations also offer reliable and largely compatible depositing services:
Centre | Location | Depositing offer |
---|---|---|
ORTOLANG | France | archiving of oral and linguistic data |
Language Archive Cologne | Germany | All audio and audio-visual language resources in particular from endangered and under-resourced languages as well as recordings of oral literature. |
Netherlands | ||
Språkbanken | Norway | C-centre with depositing services for: language datasets for R&D involving Norwegian (Bokmål, Nynorsk) or official minority languages in Norway (Sami, Kven) |
TROLLing | Norway |
The Tromsø Repository of Language and Linguistics (TROLLing), a repository of data, code, and other related materials used in linguistic research. |
Oxford Text Archive | UK |