Frequently Asked Questions - CLARIN in general
"CLARIN" is an acronym for Common LAnguage Resources and Technology INfrastructure.
CLARIN is a European Research Infrastructure for the Social Sciences and Humanities, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources.
CLARIN ERIC is the consortium of countries and intergovernmental organizations which participate in and contribute to CLARIN. The central CLARIN Office coordinates the CLARIN activities in the participating countries and associated centres. ERIC stands for "European Research Infrastructure Consortium"; it is a new European type of legal entity created exactly for transnational research infrastructures such as CLARIN, and indeed was CLARIN one of the first ERICs to be established. The countries are represented in the ERIC usually by their ministries or funding agencies.
CLARIN ERIC is the name for the overall consortium of countries and intergovernmental organizations which participate in and contribute to CLARIN.
In a number of countries the national funding for CLARIN is combined with the funding for sister infrastructures that have a partly overlapping mission. Such overlap is sometimes reflected in the name, e.g. CLARIAH (NL) in the Netherlands and CLARIAH-DE in Germany.
There are many valuable language-related resources and much relevant expertise at many individual institutions across Europe. These resources can be fruitfully re-used in new research, and ongoing and future research continues to create such valuable resources. However, they are often hard to find, especially across country borders, they are hard to get access to, it is hard to combine data and tools from different sources using different formats, and it is usually not obvious who takes care of resources after completion of projects.
By integrating existing and new valuable language resources into the CLARIN infrastructure, CLARIN provides easy and direct access to such resources and expertise. The main means to do so are central services such as the CLARIN portal. Most of CLARIN's work and CLARIN's services, however, are located at the many CLARIN centres across and even outside Europe.
CLARIN is for all Humanities and Social Science disciplines, as far as they work with digital language resources (text, multimedia, lexical information etc.). Although the focus of CLARIN activities may vary somewhat from country to country, and although many of those who participate in building and providing CLARIN have a background in linguistics, language technology or computer science, CLARIN is broadly used by scholars of many disciplines, for instance literature studies, history, political science, linguistics, sociology, psychology, computational linguistics, philosophy, ethnology, etc.
YES, most local CLARIN centres function as archives and data repositories. Usually the centres offer to support research projects from their planning on, helping for instance with the development of a data management plan. CLARIN ERIC can help you to identify and to get into contact with the most suitable centre for archiving your data; see the page on depositing services.
Although the main target group are researchers in Europe, in principle, CLARIN is open for everybody, even laymen and even from outside Europe. Access to individual resources at certain CLARIN centres, however, can be restricted (e.g. to academic users) due to copyrights, intellectual property, or privacy protection requirements.
CLARIN offers access to language data, tools to work with the data, and expertise about (research on and with) such resources. Several services and tools can be used online even without downloading the data. This is achieved by several components or services, such as the CLARIN Portal, discovery tools, federated identity, virtual collections, persistent identifiers, workspaces, online tool chains, etc., and many more individual services at the many CLARIN centres.
The resources are hosted at CLARIN centres (see FAQ on Centres), which are certified according to high quality standards and which are connected via a central registry. Via central services such as the CLARIN portal, scientists and others can easily find resources (data and tools) and expertise that are offered at any of the participating CLARIN centres.
According to the federated identity concept, it is possible to access the CLARIN centres with one’s own (academic) credentials, based on a trust network of academic organisations.
Many tools can be used online or even combined in workflow-chains for analysing and further processing the data, which can come from different centres. Unavoidably, different data formats are likely to continue to be problematic in certain situations. CLARIN aims at interoperability on all levels and therefore CLARIN promotes existing standards such as TEI and other XML-based formats.
Research data created or collected in a research project should be archived -- for testing and verification of the research, and for re-use in other research projects. CLARIN centres support research projects in the broader Digital Humanities and function as repositories for new research data and tools which are seamlessly integrated into CLARIN.
In order to be useful (findable, identifiable, usable), digital data needs to be described by metadata. There are many different needs and formats for metadata in different fields of research. CLARIN develops and promotes a metadata standard that allows for accommodating very different needs in a unified framework, called the component metadata infrastructure, CMDI.
Often different terms are used to describe similar properties of and in data, which makes it challenging to integrate these data. CLARIN uses a concept registry where each concept can be explicitly defined. This registry is closely connected with CMDI.
NO. There are many Humanities and Social Science disciplines with a wide range of needs for central infrastructures. It is difficult to imagine a single research infrastructure to attend all these needs, also considering how many European research infrastructures there are just for Physics or the Life Sciences. With its clear focus on language resources, CLARIN targets a large group of potential users in Europe (and even worldwide) who share needs for language data and tools. Wherever possible (for instance in knowledge sharing or more generic aspects of the technical infrastructure) CLARIN collaborates with other infrastructures, such as DARIAH. In some countries, this collaboration has been organized in one combined funding process.
Engagement with CLARIN can happen on several different levels. Please see this page for more information on how to get involved with CLARIN.
Citing CLARIN helps us, among other things, to report on impact and use of the CLARIN infrastructure, and thus contributes to supporting CLARIN.
Therefore, whenever you use data, tools or another service from CLARIN, please make sure to refer to the CLARIN website: www.clarin.eu, or/and a national CLARIN webpage.
For a publication, please cite:
Hinrichs, Erhard & Steven Krauwer (2014): “The CLARIN Research Infrastructure: Resources and Tools for E-Humanities Scholars.” In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), May 2014, 1525–31. (BiBTeX)
Lingua - special issue on CLARIN (open access), with seven articles that present original research conducted by using one or more of the services that CLARIN offers. Lingua, volume 178, pages 1-126 (July 2016).
Jong, Franciska de, Bente Maegaard, Koenraad De Smedt, Darja Fišer & Dieter Van Uytvanck (2018): "CLARIN: Towards FAIR and Responsible Data Science Using Language Resources." In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, 3259-3264. (BiBTeX)
When writing papers and reports, we highly appreciate, if you could explicitly mention CLARIN in the Acknowledgements section in one of the following ways:
<General case>: (Part of) the work reported here was made possible by using the CLARIN infrastructure.
<CLARIN national funding template>: The work reported here has received funding through <CLARIN national consortium member, e.g. CLARIN.SI>, <XYZ> project, grant no. <XYZ>.
<EC funded work template (secondment via CLARIN ERIC)>: The work reported here has received funding (through CLARIN ERIC) from the European Union’s Horizon 2020 research and innovation programme under grant agreement No <0-9> for project <XYZ>. (E.g. No 676529 for project CLARIN-PLUS.)
The first entry point is the CLARIN portal at www.clarin.eu, and here in particular the FAQ, or the portal of one of the national CLARIN initiatives. There you can also directly access the services, and find links to several national helpdesks that answer questions related to CLARIN and language resources.