CLARIN vision: All digital language resources and tools from all over Europe and beyond are accessible through a single sign-on on-line environment for the support of researchers in the humanities and social sciences.
CLARIN’s mission : Create and maintain an infrastructure to support the sharing, use and sustainability of language data and tools for research in the humanities and social sciences.
The mission and vision are the foundations of the long-term, overall strategy of CLARIN. In addition, every three years, the Board of Directors, in consultation with CLARIN ERIC’s General Assembly and various other stakeholders, develops a medium-term strategy: concrete policies and measurable action lines that on top of the mission and vision take recent trends and developments into account. A two-page summary of the adopted strategy for 2021-2023 can be found here. Below a more elaborate summary of the strategy is presented. The overview contains a section CLARIN strategy at a glance, plus separate sections for each of the following four priority areas:
CLARIN strategy at a glance
- Language is a carrier of cultural content and information. Language also plays a role as a reflection of scientific and societal knowledge, as an instrument for human communication and persuasion, as one of the central aspects of the identities of individuals, groups, cultures and nations, as an instrument for human cognition and creative expression, and as a formal system. Moreover, language materials form a considerable part of the historical records which are considered cultural heritage. The multifaceted nature of language is reinforced by its internal dynamics, which have both synchronic and diachronic dimensions.
- Language is omnipresent. As language plays a predominant role in many fields of study, CLARIN is a crucial pillar for the support of Social Sciences and Humanities (SSH) research.
- The application of novel, data-driven methods in the SSH domain has been further stimulated by the advance of the digital humanities and computational social sciences.
- Moreover, language resources offer a rich source of research opportunities for fields such as data science, language technology, and artificial intelligence.
- The vital importance of language data for discovering new ways for machines to interact with humans, and for humans to interact with machines, is just one example that shows how language is key to a wide range of academic disciplines.
- The advance of data-driven methods in academia and the promotion of paradigms for open access to research data has increased the need for data registries and data management services based on the FAIR principles: Findable, Accessible, Interoperable, Reusable. Providing data in open access and the sharing of resources in order to allow reuse have been central to the approach adopted.
- Interoperability guidelines have affected integration and collaboration at a range of layers, most prominently in the adoption of a common metadata standard. CLARIN thematic services partially derive their added value from the distributed and multilingual nature of the CLARIN data offer: the Virtual Language Observatory (VLO), the Federated Content Search and the Language Resource Switchboard.
- The vision of borderless and seamless interoperability between data and services is further realised through alignment with emerging cloud platforms such as the European Open Science Cloud and the SSH Open Marketplace.
- The increased interoperability is beneficial for a number of the research agendas for which CLARIN aims to provide infrastructural support, in particular in the domains that aim at innovation roadmaps through multidisciplinary collaboration and data-driven methodologies, such as the digital humanities, artificial intelligence, computational social sciences and political studies.
- The datafication of society calls for an increased level of data literacy among employees, and, more generally, among all citizens. As a research infrastructure (RI) for digital language data and technology, there is an important mission for CLARIN to help increase the level of data literacy in its member countries.
- CLARIN aims to take a role in supporting the development of use cases and guidelines for responsible data science practices that can help further the understanding of the pitfalls of data-driven methods.
- CLARIN’s potential for societal impact is also reinforced by the growing attention for support measures for language equality, and by use cases and proofs of concept for CLARIN tools in non-academic contexts, e.g. in the context of health practitioners and the workflow of journalists working with interview data. An illustration of this potential could be derived from the mapping of CLARIN’s service offer with regard to the needs of research that can contribute to the United Nations’ Sustainable Development Goals.
- Outreach to existing and new user categories amongst academic parties is pursued through the CLARIN Ambassador Programme and the CLARIN Trainer Network, and through links with RIs from outside of Europe.
- The collaboration with industry, governmental organisations, and the GLAM sector (Galleries, Libraries, Archives, and Museums) is taking shape both at the national level and at the level of CLARIN ERIC. Some commercial companies and GLAM organisations have even become part of a national CLARIN consortia.
- In recent years, CLARIN ERIC has forged a number of formal and informal links with parties from the GLAM sector, in particular with Europeana and LIBER.
- Several units of the European Commission have technology innovation agendas on language-intensive topics for which CLARIN can bring interesting expertise and support, e.g. in the form of data and tools for fake news detection and multilingual access to digital cultural heritage.
The primary mission of CLARIN is to support and accelerate research excellence. Within CLARIN, a knowledge infrastructure has been developed as the ‘glue’ for the various communities engaged with CLARIN, and as the structure that aims at securing a continuous transfer of knowledge between diverse parties involved in the construction, operation and use of the infrastructure. The deployment and further development of this knowledge infrastructure is a crucial pillar for optimising the exchange of expertise and the creation of a rich offer of online materials that can be used for instruction, explanation and teaching regarding the use of CLARIN as a service hub for easy and open access and the long-term storage of language materials.
- A network of CLARIN Knowledge Centres (K-centres) that bring together expertise in a certain domain, topic, data modality, and so on.
- Tour de CLARIN: A thorough overview of the richness of the network and the ongoing activities by featuring national consortia and centres and highlighting their prominent tools, resources, and, most importantly, the rich community of researchers developing and using the infrastructure.
- Funding schemes for:
- workshops, webinars and training sessions;
- events aimed at stimulating the uptake of CLARIN in specific disciplines and regions;
- mobility grants to enable the exchange of expertise among individual researchers, educators and technical experts.
- A programme for reaching out to new communities, supported via:
- A website that is better suited to cater for the diversity in the communities served by CLARIN and that offers a more contemporary user experience.
- More diversified communication about the technical infrastructure and its use, aimed towards both non-technical and more technically sophisticated educated audiences.
- Better visibility of the training and educational initiatives and the services offered by K-centres.
- Contextualisation of the CLARIN Resource Families by providing links to publications, guides and tutorials.
- A wider range of topics covered by the K-centres.
- Strengthening of the CLARIN networks:
- Incentives for closer interaction among the K-centres;
- A sustainable network of CLARIN trainers, to better enable the education of end-users of the infrastructure.
- Agile support:
- Adjustment of the mobility grants model and other funding instruments to better support cross-country technical development, research, teaching, and documentation.
- Central expertise and support for the CLARIN consortia to address the increased need for virtual formats for events and meetings.
Over the past few years, CLARIN has constructed a sound and robust technical basis to enable the sharing and reuse of language data and tools across institutional, disciplinary and international borders. In line with the distributed nature of the network, the data and services provided by CLARIN have been designed to be highly interoperable. Capitalising on the federated nature of the infrastructure has proven a critical precondition for remaining at the forefront of technology. The time is now ripe to work towards the next level of interoperability within the CLARIN ecosystem. This requires efforts on the side of the tool and data providers, which will be closely coordinated with the National Coordinators’ Forum and the Standing Committee for CLARIN Technical Centres, and close monitoring of the dynamics in the wider research infrastructure landscape.
- Adherence to the FAIR principles is followed in the design of all data services.
- Guidance and consultancy is provided on the use of persistent identifiers. This includes topics such as when to use Handles and Digital Object Identifiers, and the connection to machine-actionability.
- Procedures for quality assurance and curation, both for tools and (meta)data are in place. This includes automated checks where possible (e.g. to ensure technical interoperability) and manual intervention where needed (e.g. to assess and improve semantic interoperability).
- Readiness for integration in the European Open Science Cloud (EOSC): CLARIN has demonstrated its technical maturity by being one of the first RIs that could offer integrated services through the EOSC Portal, which underlines the high level of interoperability that has been achieved.
Improved discoverability of the tools provided via CLARIN centres and thematically related platforms, such as the European Language Grid and the SSH Open Marketplace. To this end the synchronisation between relevant sources of information will be enhanced, including the website, the Language Resource Switchboard, the VLO, and tool-tailored search functionality.
Enhancement of CLARIN Resource Families by integrating multiple resources and tools into an innovative single virtual environment for data processing, analysis and comparative research.
Innovative data architecture models with a particular focus on advanced connections between corpus search engines and long-term archiving.
"CLARIN for Programmers":
Natural Language Processing services will be advertised more prominently to programming scientists, e.g. with well-documented application programming interfaces and example snippets in popular development environments.
The results and innovations will be disseminated via web-based tool registries and tutorials for programmers.
FAIRness and especially interoperability are furthered at several levels:
among CLARIN centres (e.g. through core metadata recommendations and a general CLARIN gateway service for FAIR digital objects),
with GLAM partners and other data providers and users that can benefit from CLARIN's processing services.
For the realisation of the mission of CLARIN and the implementation of the strategic agenda, a governance model and central support organisation are in place that have gradually evolved from a relatively small project organisation for the coordination of activities in around ten countries, along with a technical development team, into a professional organisation with responsibility for a range of central tasks plus support for the coordination activities in 24 countries.
- Current capacity:
- Coordination support: A good model has been implemented for collaboration and sharing of responsibilities among the Office team members who work from a service-oriented mindset that contributes to overall trust building.
- Technical development: A well-organised team of technical developers is in place, adequately balanced in terms of individual skill profiles, and capable of maintaining the high technical standard of the technical services.
- Financial organisational support: Expertise on rules and procedures for ERIC’s and European projects is well covered by the financial team.
- Communication: A range of communication channels for outreach and dissemination is in place (newsflashes, websites, social media, video channels, printed materials). Dedicated mailing lists for specific communities, such as the network of people with an interest in parliamentary data, and the teachers network.
- Creation and editing of the audiovisual dissemination materials: There is adequate capacity for the basic generation and processing of audiovisual materials and design tasks.
- Strengthening the existing coordination framework:
- Reinforced models for coordination and collaboration between the various bodies and task forces active in CLARIN.
- Diversification of communication with stakeholders and communities outside of the national consortia.
- In alignment with sister infrastructures: ensuring that instruments are in place for capacity development for the building, maintenance and management of the RI nodes, including a common reward system for professionals with a career in the RI landscape.
- Increased awareness and understanding of the central and decentral roles in ensuring responsibility for resource quality.
- Enriching the Office capacity:
- Training & education coordination
- Technology watch
- A trusted and sustainable model for collaboration with professional designers for front-end website design, development, and updates.
Given the stage of maturity that CLARIN has reached, greater attention to consolidation and sustainability along various axes has become of vital importance in technical, financial, and organisational terms.
- CLARIN is continuously monitoring the need for the evolution and adjusted articulation of the CLARIN value proposition.
- The CLARIN value proposition is well aligned with the strategy of the key stakeholders.
- Members and partners:
- Emerging national consortia are offered workshops on how to set up a CLARIN infrastructure at the national level and a series of best practice papers has been prepared that helps leverage the visibility of the results from national investments.
- GLAM sector: Structural collaboration with the GLAM sector has been established at the European level.
- Connection to industry: In many CLARIN member countries one or more models for collaboration with industry and public-sector parties is in place or foreseen.
- Financial and organisational sustainability:
- Consolidation and extension of the current membership base (critical for both organisational and financial sustainability).
- Diversification of the financial portfolio in ways compatible with the RI role as a distributor of resources created with public funds.
- Development of a decommissioning scenario for decentral nodes that lose their local funding, including objective criteria for assessing what constitutes crucial infrastructural resources for which fall-back service options in the wider CLARIN network could be offered.
- Integrating CLARIN within scientific communities:
- Extension of the number of disciplinary communities in which CLARIN services are adopted.
- Pro-active stimulation of compatibility and incorporation of disciplinary results (e.g. from ERC grants and mission-oriented projects) in the CLARIN infrastructure.
- Alignment of the services offered by CLARIN with disciplinary research agendas, in particular those of social sciences and humanities.
- Strengthening international collaborations:
- The collaboration with non-academic parties and the potential for sustainable relationships is better aligned with successful models at the national level and emerging local initiatives are better supported by CLARIN ERIC.
- Collaboration with actors from the GLAM sector will be consolidated and new alliances will be sought aiming at opening up cultural heritage sources with digital methods.
- Viable models for more structural information exchange and collaboration with industrial parties will be articulated and promoted among the CLARIN nodes, partly based on the recommendations to be expected from the H2020 project ENRIITC.
- Joint development of formal models for collaboration among RIs beyond Europe.