Frequently Asked Questions - CLARIN in general
'CLARIN' is an acronym for Common LAnguage Resources and Technology INfrastructure.
CLARIN is a European Research Infrastructure for the social sciences and humanities, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-lingual competence. CLARIN provides several services, such as access to language data and tools to analyse data, the possibility to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources.
CLARIN ERIC is the consortium of countries and intergovernmental organizations which participate in and contribute to CLARIN. The central CLARIN Office coordinates the CLARIN activities in the participating countries and associated centres. ERIC stands for "European Research Infrastructure Consortium"; it is a new European type of legal entity created exactly for transnational research infrastructures such as CLARIN, and indeed was CLARIN one of the first ERICs to be established. The countries are represented in the ERIC usually by their ministries or funding agencies.
CLARIN ERIC is the name for the overall consortium of countries and intergovernmental organisations which participate in and contribute to CLARIN. In each of the member countries, the local CLARIN node takes a localised name, e.g. FIN-CLARIN in Finland, clarin:el in Greece, or CLARINO in Norway.
In a number of countries, the national funding for CLARIN is combined with the funding for sister infrastructures that have a partly overlapping mission. Such overlap is sometimes reflected in the name, e.g. CLARIAH (NL) in the Netherlands and CLARIAH-DE in Germany.
There are many valuable language-related resources and much relevant expertise at many individual institutions across Europe. These resources can be fruitfully re-used in new research, and ongoing and future research continues to create such valuable resources. However, they are often hard to find, especially across country borders, are difficult to access, and it can be hard to combine data and tools from different sources using different formats. It is usually also not obvious who takes care of resources after the completion of projects.
By integrating existing and new valuable language resources into the CLARIN infrastructure, CLARIN provides easy and direct access to such resources and expertise. The main means to do so are central services, such as the CLARIN portal. Most of CLARIN's work and CLARIN's services, however, are located at the many CLARIN centres across and even outside of Europe.
CLARIN is for all humanities and social science disciplines, as far as they work with digital language resources (text, multimedia, lexical information etc.). Although the focus of CLARIN activities may vary somewhat from country to country, and although many of those who participate in building and providing CLARIN have a background in linguistics, language technology or computer science, CLARIN is broadly used by scholars of many disciplines, for instance literature studies, history, political science, linguistics, sociology, psychology, computational linguistics, philosophy and ethnology.
Yes, most local CLARIN centres function as archives and data repositories. Usually, centres offer to support research projects from the planning stage, helping for instance with the development of a data management plan. CLARIN can help you to identify and to get into contact with the most suitable centre for archiving your data (see the page on depositing services).
Although the main target group are researchers in Europe, in principle CLARIN is open to everybody, including laypeople and those from outside of Europe. Access to individual resources at certain CLARIN centres, however, can be restricted (e.g. to academic users) due to copyrights, intellectual property, or privacy protection requirements.
CLARIN offers access to language data, tools to work with the data, and expertise about (research on and with) such resources. Several services and tools can be used online even without downloading the data. This is achieved by several components or services, such as the CLARIN portal, discovery tools, federated identity, virtual collections, persistent identifiers, workspaces, online tool chains, etc., and many more individual services at the many CLARIN centres.
The resources are hosted at CLARIN centres (see FAQ on Centres), which are certified according to high quality standards and which are connected via a central registry. Via central services such as the CLARIN portal, scientists and others can easily find resources (data and tools) and expertise that are offered at any of the participating CLARIN centres.
According to the federated identity concept, it is possible to access the CLARIN centres with one’s own (academic) credentials, based on a trust network of academic organisations.
Many tools can be used online or even combined in workflow-chains for analysing and further processing the data, which can come from different centres. Unavoidably, different data formats are likely to continue to be problematic in certain situations. CLARIN aims for interoperability at all levels and therefore CLARIN promotes existing standards such as and other XML-based formats.
Research data created or collected in a research project should be archived for testing and verification of the research, and for re-use in other research projects. CLARIN centres support research projects in the broader digital humanities and function as repositories for new research data and tools which are seamlessly integrated into CLARIN.
In order to be useful (findable, identifiable, usable), digital data needs to be described by metadata. There are many different needs and formats for metadata in different fields of research. CLARIN develops and promotes a metadata standard that can accommodate very different needs in a unified framework, called the component metadata infrastructure ( ).
Often, different terms are used to describe similar properties of and in data, which makes it challenging to integrate these data. CLARIN uses a concept registry where each concept can be explicitly defined. This registry is closely connected to the .
No. There are many humanities and social science disciplines with a wide range of needs for central infrastructures. It is difficult to imagine a single research infrastructure to cater for all these needs, especially considering how many European research infrastructures there are just for physics or the life sciences. With its clear focus on language resources, CLARIN targets a large group of potential users in Europe (and even worldwide) who share the need for language data and tools. Wherever possible (for instance, in knowledge sharing or more generic aspects of the technical infrastructure), CLARIN collaborates with other infrastructures, such as DARIAH. In some countries, this collaboration has been organised in one combined funding process.
You can get involved with CLARIN on several different levels. Please see this page for more information.
Citing CLARIN helps us, among other things, to report on impact and use of the CLARIN infrastructure, and thus contributes to supporting CLARIN. Therefore, whenever you use data, tools or another service from CLARIN, please make sure to refer to the CLARIN website: www.clarin.eu and/or a national CLARIN webpage.
For a publication, please cite as follows:
Hinrichs, Erhard & Steven Krauwer (2014): 'The CLARIN Research Infrastructure: Resources and Tools for E-Humanities Scholars.' In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), May 2014, 1525–31. (BiBTeX)
Lingua - Special Issue on CLARIN (open access), with seven articles that present original research conducted by using one or more of the services that CLARIN offers. Lingua, volume 178, pp. 1-126 (July 2016).
Jong, Franciska de, Bente Maegaard, Koenraad De Smedt, Darja Fišer & Dieter Van Uytvanck (2018): 'CLARIN: Towards FAIR and Responsible Data Science Using Language Resources. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, 3259-3264. (BiBTeX)
When writing papers and reports, we would appreciate it if you could explicitly mention CLARIN in the acknowledgements section in one of the following ways:
- Specialised metadata information on language resources (e.g. on text corpora, lexica or sign language recordings)
- Specific options to explore the deposited data and the metadata of services to language tools as well as several corpus query engines (PML-TQ, Kontext) for a parallel corpus
- Connections with CLARIN gateway services that allow processing or citing/bundling of the data. Examples include:
In addition to the language resource-specific enrichments, there are also a few infrastructural perks available via CLARIN repositories, such as federated login for academic users, leading to a higher trust level than logins with ad-hoc accounts.
However, Invenio would certainly be an option when choosing a repository for a CLARIN centre, although a few customisations would be needed to make it fully CLARIN-compliant. For example, Kai Wörner at the University of Hamburg presented last year how they are thinking of using InvenioRDM (specific variant) as their institutional repository, including some CLARIN materials.