Tour de CLARIN: CLaDA-BG Dissemination Activities

Submitted by Jakob Lenardič on 21 November 2019

Blog post written by Petya Osenova and Kiril Simov, edited by Darja Fišer and Jakob Lenardič

With partners belonging to both CLARIN and DARIAH, the CLaDA-BG consortium is very heterogeneous. For that reason, it regularly organizes seminars and dissemination activities that are aimed at presenting the infrastructure to researchers with backgrounds in the Humanities, such as history, ethnography, library science, and museology. These are ‘hosted events’, which means that CLaDA-BG experts visit consortium institutions and their teams. During CLaDA-BG’s first year of operation we mainly disseminated the goals of the infrastructure to the interested audience and partners. At these meetings we received valuable feedback and also learnt a lot about the needs of the potential users, which are summarized below.

The first awareness raising event was called “Open Science Infrastructures for Big Cultural Data: International Advanced Masterclass”. It was organized by UCL Qatar in collaboration with DARIAH-EU and the National Library Ivan Vazov - Plovdiv. The event was held in Plovdiv, Bulgaria, from 13 to 15 December 2018. Members of CLaDA-BG presented the available resources and tools. Dimitar Iliev from Sofia University presented Telamon, which is a corpus of Greek inscriptions found in Bulgaria, while Dimitar Minev, who is the Director of the Plovdiv National Library, talked about how museology datasets can be turned into Linked Open Data resources. A feedback session followed where interesting comments were provided by colleagues from the British Library, especially on the integration of the processing language and image information. The event was concluded by a panel discussion on the Next Steps in Bulgarian Open Humanities. It was chaired by CLaDA-BG coordinator Kiril Simov. The panellists were Roumiana Preshlenova (the Institute of Balkan Studies and Centre of Thracology, BAS, CLaDA-BG) and Georgios Papaioannou (UCL Qatar).

On 15 February 2019, a one-day seminar was organized at the Institute of Balkan Studies with a Center of Thracology. At the event, Kiril Simov presented the mission, constitution and organization of the infrastructure, while Petya Osenova presented the resources and tools it offers, after which researchers from the Institute presented their work. In the second part of the seminar, the participants discussed with the lecturers how to make their data and dictionaries machine readable and searchable, and how to OCR and process old books or newspapers at the Institute. The lecturers explained the basic principles of constructing structured data and processing it with the pipe for Bulgarian. In addition, a decision was made to use the data provided by the Institute for the creation of a normalization model to modernize old texts, making them processable by the existing NLP modules.

On 30 and 31 May 2019, Kiril Simov delivered a dissemination lecture at the 12th National Conference “Education and Research in the Information Society”. The lecture was titled “Integrated Language and Knowledge Resources for CLaDA-BG”. The participants were representatives of Bulgarian libraries, universities and educational institutions. The libraries were especially interested in the aspects of improving and speeding up the digitization of their data. In response to their interest, CLaDA-BG started working on the creation of an appropriate normalization model for older texts.

On 23 August 2019, CLaDA-BG experts attended an informal seminar at the Cyrillo-Methodian Research Centre, BAS which has a rich collection of medieval manuscripts in Old Bulgarian, Russian, German and other languages. The main problem of researchers who study Cyril and Methodius is the proper handling of old lexica and their compilation into searchable online dictionaries, a prerequisite for which is OCR and editing, which calls for the reuse of the existing services available in CLARIN centres in other European countries.

Click here to read more about Tour de CLARIN