Skip to main content

South Africa joins CLARIN ERIC as observer

Submitted by Linda Stokman on

We are very pleased to announce that in October 2018 South Africa has joined CLARIN ERIC as an observer.

The representing entity of South Africa will be the North-West University (Potchefstroom Campus) where the South African Centre for Digital Language Resources (SADiLAR) is based. 

Professor Attie de Lange, Director of SADiLAR has been appointed to represent South Africa as observer at the General Assembly meetings and Dr. Roald Eiselen, Technical Manager of SADiLaR, has been appointed as the contact person for the national consortium of South Africa that is currently being set up.

SADiLaR is a national centre supported by the Department of Science and Technology (DST).SADiLaRhas an enabling function, with a focus on all eleven official languages of South Africaand supports research and development in the domains of language technology and language-related studies in the humanities and social sciences. The Centre aims to create, manage and distribute of language resources, as well as applicable software, which are freely available for research purposes through the Language Resource Catalogue.

SADiLaR and CLARIN

One of the main aims of SADiLaR is to integrate both its resources and technologies into infrastructures and web services to allow researchers to access and use the resources and technologies in an integrated fashion. Over the medium term, SADiLaR hopes to make all of their resources available for discovery via CLARIN's VLO, while also integrating some of the automatic processing tools developed for the South African languages, such as part-of-speech taggers, named entity recognisers, and OCR engines, available for use through the Switchboard, developed by CLARIN. Furthermore, SADiLaR is working on reusing some of the technologies developed within CLARIN available to South African researchers, focusing on data that is available for our languages. Some of the technologies that will be released from SADiLaR will be corpus search and processing tools, Wordnet viewers, stylometry tools, and web annotation tools that integrate some of the language specific technologies already developed for the languages.

SADiLaR Activities

SADiLaR runs two programmes:

  1. A digitisation programme, which entails the systematic creation of relevant digital text, speech and multi-modal resources related to the official languages of South Africa.  The development of appropriate natural language processing software tools for research and development purposes are included as part of the digitisation programme.
  2. A Digital Humanities programme, which facilitates the building of research capacity by promoting and supporting the use of digital data and innovative methodological approaches within the Humanities and Social Sciences.

SADiLaR’s Vision

SADiLaR envisions: 

  • becoming a major resource centre by creating, managing and distributing digital language resources related to the official languages of South Africa;
  • developing awareness and supporting academic scholarship in Digital Humanities;
  • playing a vital role in building research capacity; and
  • extending its scope beyond the official languages of South Africa to other languages on the African continent, becoming a digital language resources hub for Africa.

Partners

SADiLaR includes several member institutions that are centrally integrated into the functioning of the centre. Each of the partner institutions have a specific specialisation area:

  • Department of African Languages, University of Pretoria - Digitisation
  • Department of African Languages, University of South Africa - Wordnet and terminology development
  • CSIR Meraka Institute: Human language technology group - Speech data and technologies
  • Centre for Text Technology, North-West University - Text data and technologies
  • Inter-institutional Centre for Language Development and Assessment - Language teaching and assessment