Tour de CLARIN: France

Submitted by Jakob Lenardič on 13 January 2021

Written by Nicolas Larousse and Christophe Parisse

France has been an of CLARIN since 2017.  The national coordinator of CLARIN-FR is Nicolas Larousse and is coordinated by Huma-Num. It involves several national partners:

The French consortium is mainly composed of French linguists involved through the CORLI expert group, which promotes the use of good practices for corpus creation, focusing on how to maximize corpus reuse and disseminate the corpus data. CORLI is also active in promoting the application of FAIR principles. Moreover, French researchers are involved via the ATALA association and the group GDR TAL.   Additionally, CLARIN-FR has recently started to establish contacts with the French cognitive sciences community, in particular with Institut Carnot pour la Cognition for its “Cognition & Langage” research project. 

CLARIN-FR has so far established 3 C-Centres:

  • The COCOON Centre provides a data repository with access to oral resources (with a focus on dialectal texts) and an interactive web portal that offers a chain of navigational and analytical tools to the French digital research community. A unique feature of COCOON is that all the oral resources are tagged with precise geolocational metadata so they can be searched geographically on a state-of-the-art interactive worldwide map offered on the web portal.
  • The ORTOLANG centre provides a general-purpose repository for secure long-term storage of language data mostly pertaining to the languages spoken in France, although resources from other origins are accepted as well, especially when the data come from countries where no public repositories like ORTOLANG are available. For example, COMERE is one of the most visible corpora of ORTOLANG and constitutes computer-mediated language resources such as tweets and text message.  
  • The MMSH's Sound Archives Center (Phonothèque) preserves, archives and disseminates the archived recordings of the sound heritage relating to the ethnology, languages, history, music and literature from the Mediterranean area. For example, the Phonothèque makes accessible, with ethical and legal rules, recordings of different dialects of Occitan or variants of colloquial Arabic (Syria, Lebanon, Sudan, Algeria, Yemen).

In 2020, CLARIN-FR established the French K-centre for Corpora, Languages and Interaction. The K-centre focuses on providing information, tools, continuing education, to help PhD students and professional linguists work on corpus linguistics. It is run by a panel of corpus linguists who provide their expertise to the community. CLARIN-FR has also successfully added two end points to the Federated Content Search from ORTOLANG and COCOON C-Centers. 

Following the Work Plan for the 2019 renewal of the status as CLARIN observer and after establishing the K-Centre, CLARIN-FR now aims to obtain the Core Trust Seal certification for the ORTOLANG repository so that latter can become a CLARIN B-Centre.  After the observership period ends in 2021, CLARIN-FR aims to continue discussions with the French ministry of research, the French National Centre for Scientific Research CNRS and national communities about the opportunity for France to become a full CLARIN member.

Nicolas Larousse, coordinator of CLARIN-FR