What is happening at....CLARIN Slovenia

Submitted by Tomaž Erjavec on 16 December 2015

November and December were busy months for CLARIN.SI, especially in the area of user engagement. In cooperation with the Slovenian research project JANES (Resources, Tools and Methods for the Research of Nonstandard Internet Slovenian), the Slovenian Language Technologies Society, and the Swiss project ReLDI (Regional Linguistic Data Initiative) we organised the Conference Slovenian on the Web and in New Media on 25–27 November 2015 in Ljubljana. A pre-conference tutorial 'Beyond example extraction: Quantitative analysis of the JANES corpus' was given by Maja Miličević from the University of Belgrade, where Slovenian linguists were introduced to statistical modelling and the R package.The conference opened with the invited talk by Michael Beißwenger from the Dortmund Technical University titled 'Linguistic annotation of social media corpora: To what extent do we have to adapt existing encoding standards and tag sets?'. This was followed by 15 regular talks and a round table on Slovenian user-generated content. The proceedings with full papers are available on the web page of the conference. The invited lecture and the plenary talk were recorded for Videolectures.net and should soon be available online.

The second round of events was also co-organised by CLARIN.SI, JANES and ReLDI, and went under the name Janes Express, where we organised a series of lectures and tutorials in Ljubljana, Zagreb and Belgrade. On 13 and 14 November in Ljubljana Kaja Dobrovoljc first gave an introduction to using WebAnno for linguistic annotation tasks, while the second day was devoted to explaining to future potential annotators how to use WebAnno and the associated linguistic guidelines for annotating a reference corpus of Slovenian tweets. Both events were well attended, with the lecture on WebAnno gathering 30 professors and students from the University of Ljubljana, and the practical tutorial attended by 11 potential annotators.

In Zagreb, on 4 December, we gave an introduction on on-line Slovene language resources to students of Slovene language at the University of Zagreb and an introduction to WebAnno and how to apply the guidelines developed in the scope of ReLDI to annotating a reference corpus of Croatian Tweets. The lecture on language resources for Slovene was given in two groups, with an attendence of over 70 students of Slovene, while tutorial on WebAnno saw 15 potential annotators, teachers and researchers from the University of Zagreb. In Belgrade, on 10 December, we followed the same scenario as in Zagreb, again with the talk on on-line Slovene language resources for 50 students of Slovene, and the tutorial on WebAnno for annotating the reference corpus of Serbian tweets attended by 20 annotators, teachers and researchers from the University of Belgrade.

In summary: In November and December very successful conferences were held in Slovenia. 180 students and professors from Slovenia, Croatia and Serbia were introduced to working with Slovene language corpora and other internet-based resources or given an introduction to WebAnno. Furthermore, the first steps were taken in producing a manually annotated multilingual comparable corpus of tweets of three South Slavic languages, which will be an extremely useful resource for studying user-generated content in a multilingual and multicultural setting. These events also highlighted the importance of cooperation between CLARIN.SI and the related initiatives in the region, leading to highly synergic effects.