You are here

Tour de CLARIN: Estonia


The Estonian CLARIN consortium, officially called Center of Estonian Language Resources (CELR), is a founding member of CLARIN ERIC. It is a B-certified centre that involves four Estonian research institutions – University of TartuTallinn University of TechnologyInstitute of the Estonian Language and Estonian Literary Museum. The National Coordinator of CLARIN ERIC in Estonia is Kadri Vider. Aleksei Kelli, an Estonian legal expert,  is the chair of the CLARIN Legal and Ethical Issues Committee (CLIC).

CELR provides access to Estonian language resources and language technology software (dictionaries, text and speech corpora, language databases, language software) for everyone working with digital language materials. The consortium also coordinates and organises the registration and archiving of the resources as well as draws up necessary legal contracts and licences for different types of users.

The CELR LR META-SHARE registry currently contains 152 registered and published records in Estonian as well as  24 in other languages with VLO-harvestable metadata, each of them having DataCite DOI. These comprise 59 lexical-conceptual resources, 66 corpora, 25 tools and services and 2 language descriptions. Among the many resources provided by CELR are several monolingual as well as multilingual dictionaries, such as the Dictionary of Standard Estonian and the dynamically updated English-Estonian Machine Translation Dictionary, all of which can be queried online. The language tools include text and speech processing services, such as the Android Newsreader, which reads aloud the news articles in Estonian, and a comprehensive rule-based morphology toolkit which consists of modules for syllabification, paradigm recognition, morphological analysis and synthesis.

In addition to collecting, registering and archiving language resources, CELR also introduces the resources to the potential users. The most successful outreach events in the past years were the workshops and tutorials about Estonian text corpora in KORP and lexical resources from the Institute of Estonian Language. Through the promotion of KORP usage we have reached out to the broader community of DH researchers in Estonia. Literary scholars have become interested of data analysis methods in literary studies, which has resulted in a collaborative project whose aim is to compile a corpus for literary studies. The collaboration is significant for the current stage of Estonian DH data digitization, which needs to become more machine-analyzable so that close-reading of digitized texts and a more sophisticated searching for tendencies in the bigger data collections become possible for the DH scholars.

The centre is also involved in the National Programme for Estonian Language Technology, whose aim to support the development of new language technologies for Estonian and associated initiatives. CELR is responsible for archiving the outcomes of the projects and introducing the resulting developments in language technology to the widest possible audience.


The Estonian CLARIN team


Blog post written by Kadri Vider and edited by Darja Fišer and Jakob Lenardič