You are here

Blogs

 

In this Tour de CLARIN blog post, we present an in-depth interview with Kaja Dobrovoljc, a Slovenian corpus linguist who works at the Centre for Language Resources and Technologies and regularly collaborates with CLARIN.SI and uses its infrastructure.

 
 

In this Tour de CLARIN blog post, we present an in-depth interview with Nan Bernstein Ratner, who is along with Brian MacWhinney one of the PIs of FluencyBank, a shared database for the study of the development of fluency in typical and disordered populations.

 
 

CLARIN Slovenia (CLARIN.SI) has contributed to several user involvement events which presented the results of the project to different user groups.

 
 

Read about the CSMTiser, a supervised machine learning tool that performs word normalization by using Character-level Statistical Machine Translation.

 
 

TalkBank, which was recognized as a CLARIN Knowledge Centre in 2016, is the world’s largest open access integrated repository for spoken language data. It provides language corpora and other audio resources to support researchers in Psychology, Linguistics, Education, Computer Science, and Speech Pathology.

 
 

In 2015, researchers from the Jožef Stefan Institute in Ljubljana, Slovenia released the first emoji sentiment lexicon, called Emoji Sentiment Ranking 1.0, and published it as a resource in the public language resource repository CLARIN.SI. With 78,500 downloads to date, the lexicon is the most downloaded resource in the CLARIN.SI repository.

 
 

CLARIN.SI joined CLARIN ERIC in 2015 and is a B-certified centre which offers a LINDAT/D-Space repository that currently contains around 110 language resources for Slovenian as well as for other languages, especially Croatian and Serbian.

 
 
     
 
 

The collection Grundtvig’s Works are published by the Grundtvig Center at the University of Aarhus and will contain 1000 text critical and commented editions of the printed authorship by N.F.S. Grundtvig when finalized in 2030. Since the Grundtvig Center itself does not offer the possibility for downloading the underlying files, CLARIN-DK was approached as a repository provider.

 
 

Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. The CST lemmatizer learns lemmatization rules not only from word endings, and recognizes a wide variety of derivational patterns; e.g., prefixation, infixation, suffixation.