A CLARIN Resource Family for Sign Languages

Submitted by e.gorgaini@uu.nl on 28 November 2022

The initiative that lead to this project was the result of a workshop for all CLARIN Knowledge Centres (K-centers) from 30 November to 1 December 2020. This workshop, which was established by Bente Maegaard and now takes place annually, brought together all K-centres working with Sign Language (SL) resources in order to explore ways of collaboration. One of the first agreed priorities was to create a Resource Family portal that would make SL resources more visible and accessible for users across the globe.

CLARIN’s Resource Families (CRF) are designed to facilitate comparative research. We embarked on a mission of making a CLARIN Resource Family for Sign Languages (CRF-SL), since we as joint K-centres for sign languages have the expertise for this. Four of the centres wrote a proposal for CLARIN, which was approved for funding in November 2021.

The K-centres involved were:

The Plan

The team planned to established a CLARIN Resource Family for Sign Languages along this line of action:

Establish a CRF for SL corpora on this CLARIN website, both for corpora and lexicons to make SL corpora more visible, findable and accessible. The basis for this are the SL resources that K-centres offer as part of the CLARIN infrastructure, but the inventorisation will go beyond K-centres and take into account everything that is available through the and other (B- and C-) centres’ repositories, so that it is comparable to the rest of the CRF families. We distinguish three subtasks:

  • Make an inventory of the material (datasets and resources) offered by K-centres with expertise in SL.
  • Make an inventory of other datasets in the VLO which may qualify as members of the new CRF by contacting the right holders.
  • Make an inventory of any other material (e.g. new datasets, annotation tools, manuals) not yet accessible through the CLARIN Infrastructure by sending out questionnaires to SL communities.

Since within our K-Centers we cherished the valuable collaborations in the SignON and EASIER projects, we proposed to include the overview provided by the EASIER project in deliverable 6.1 as a sound basis for our work.

Finally, we proposed to explore and implement an extra metadata field in current profiles for ‘modality’, so as to refer to SL resources as a visual-gestural modality, stimulating or forcing  other providers to specify this for their metadata.

The Result

Sign language (SL) corpus resources contain transcriptions/annotations of spontaneous or elicited discourses (dialogues,  narratives, and various genres). All resources are in a video format because of the gestural/spatial-visual modality, a vital characteristic of signed languages (sign languages, used by Deaf-blind signers, can be received in tactile modality). SL corpora are crucial resources for various types of linguistic research, such as lexicography, phonology, syntax, and pragmatics, as well as for language typology, sign language teaching as L1 or L2, deaf education, and for developing language technologies for signed languages. 

The team worked efficiently and in great harmony in bringing all the resources together, which resulted in a long list of resources available through the CLARIN infrastructure, as well as resources that are not yet available but for which contact points could be established.

In the overview of resources that we created and present on the CRF page, we distinguished four categories:

  1. Sign language resources in the CLARIN infrastructure: Corpora
  2. Sign language resources in the CLARIN infrastructure: Lexical Resources
  3. Other sign language resources: Corpora
  4. Other sign language resources: Lexical Resources.

The results of our detective work are now visible on CLARIN’s CRF page as a separate Family for Sign Language Resources: https://www.clarin.eu/resource-families/sign-language-resources.

In total, this page connects you to 51 corpora (in 24 different SL) and 24 lexical resources (in 10 different SL).

In order to classify datasets on a key dimension for the sign linguistics and sign language technology field,  we also created a new component ModalitySL in the CMDI Component Registry, with the values:

  • Sign language 
  • Sign language (tactile)
  • Spoken language
  • Sign system
  • Silent gesture 
  • Embodied speech
  • Sign-supported speech
  • Unspecified
  • Unknown
  • Other.

From now on this component can be used in SL resources to indicate the modality of SL resource involved.

Presentation of Results

We presented the CRF-SL project at the Bazaar of the CLARIN Annual Conference in Prague on 11 October 2022.