The CLARIN & SSHOC Vocabulary Initiative

Submitted by Elisa Gorgaini on 16 June 2021

About the Initiative

Given the breadth of the Social Sciences and Humanities (SSH) sector, it is of no surprise that researchers are faced not only with a multitude of theoretical and empirical approaches to research but also with a large pool of vocabularies and vocabulary service tools and platforms that are intended to help researchers in their endeavours. This high diversification can extensively hinder their positive impact on data aggregation, discovery and access.

To address this issue and to further the SSH Open Cloud project (SSHOC) objective of building a sustainable model for sharing and optimising research data and services across the SSH domains, CLARIN and SSHOC are investigating ways to collect, register and harmonise domain-specific controlled vocabularies, thesauri and taxonomies, and options for a suitable platform to manage them. To this end, SSHOC and CLARIN organised a series of three online awareness-raising sessions in September 2020, where experts from SSHOC and other related Horizon-2020 projects, as well as end-users, shared valuable experience through the presentation of several use cases. The insights collected during these sessions were further elaborated during the virtual workshop organised in early November 2020. The workshop brought together a diverse group of speakers from SSHOC and other H2020 projects such as TRIPLE, who discussed whether controlled vocabularies and vocabulary hosting platforms could be made more interoperable by following the FAIR principles. 

The virtual events were well attended. About 32% of the participants identified themselves with ‘University and/or Research Performing Organisations’, 23 % with ‘Individual researcher’, about 20% with ‘Research and e-Infrastructure/ thematic clusters’, while the rest were citizen scientists and industry players.


The conclusions of the vocabulary initiative could be summarized as follows:

Requirements for Vocabulary Platforms

More discussions are needed to come to a final recommendation for SSHOC vocabulary platforms that would host and publish SSH vocabularies. Although none of the evaluated platforms matches all the requirements, it was agreed that further coordination towards sharing a vocabulary infrastructure would be most useful. Furthermore, partly as a result of the discussions, there is a definite move towards using Skosmos as the vocabulary platform of choice, although the use of alternative tools remains an option.

Finally, a close collaboration with the Skosmos development team is needed for further implementation of the desiderata of the SSH community. All stakeholders should work together towards better and more easy sharing and discovering of relevant SSH vocabularies.

Requirements for Managing Vocabularies

Semantic and technical interoperability seems to be the most important requirements. They could be fulfilled by ensuring that vocabularies provide comprehensive coverage of the domain they represent through structured concept definitions and examples. Furthermore, vocabularies should be made FAIR and published with their own metadata. Standardization could be achieved using Linked Open Data formats (i.e. RDF, SKOS, OWL).

Since SSH is a very heterogeneous domain with diverging traditions, there are different ways to describe the same phenomena. Consequently, there are quite a number of different schemas using concepts (represented in vocabularies and ontologies) in use, leading to a semantic interoperability challenge when trying to convert between descriptions or analyse collections spanning data using different description schema. Working towards mapping everything to a single schema, using a single set of concepts is a costly affair. Recently, it has been proposed by the SEMAF EOSC project to adopt a flexible semantic mapping framework targeted at specific interoperability goals only, while all such mappings should be registered for sharing and possible reuse and extension.

