Tour de CLARIN: the Knowledge Centre for Polish Language Technology

Submitted by Jakob Lenardič on 18 December 2019

Blog post written by Jan Wieczorek, edited by Darja Fišer and Jakob Lenardič

The Knowledge Centre for Polish Language Technology (PolLinguaTec) is a CLARIN K-Centre that is part of Language Technology Centre CLARIN-PL (LTC) at Wrocław University of Science and Technology. The main aim of PolLinguaTec is to provide knowledge on the application of tools and systems for natural language analysis, especially Polish, within Digital Humanities and Social Sciences. PolLinguaTec providesextensive documentation, such as instructions, guidelines, and tutorials, as well as experienced experts able to solve problems related to the use of language processing tools (e.g., Morpho, Tager, WSD, Chunker, Parser, Spejd) and resources (e.g., Walenty, plWordNet - Słowosieć) that are developed at LTC. PolLinguaTec also helps researchers with the use of language services tailored to interdisciplinary research in Digital Humanities and Social Sciences, such as the LEM Literary Exploitation System and the WebSty text similarity analysis system (and its multilingual counterpart WebstyML), as well as various tools for extracting information from text, such as TermoPL and LEM, and tools for sentiment analysis and topic modelling. Since May 2019, PolLinguaTec has helped plan the implementation of the CLARIN infrastructure in about 10 research projects. The authors of six of the projects also asked for help in preparing their grant applications that were submitted.

PolLinguaTec disseminates knowledge about the use of in humanities and social research at conferences and other scientific events. To this end, PoLinguaTec members have organised a number of User Involvement events, such as workshops focusing on NLP research infrastructure (e.g. ten editions of “CLARIN-PL in Research Practice” workshops), as well as seminars for smaller research teams, such as the seminar CLARIN-PL Tools in Scientific Research in Psychology Seminar).

Participants of the workshops "CLARIN-PL in research practice"

Ever since PolLinguaTec was founded in 2017, it has been involved in the implementation of language services in research projects, many of which have resulted in publications with significant results for Digital Humanities and Social Sciences. For instance, Geller (2019) collaborated with PolLinguaTec in conducting a diachronic linguistics study of the lexical and semantic effects that arise from long-term language contact by looking at Polish lexical borrowings in Yiddish. The research team needed a tool to create a digital lexical database, which in its conception resembled wordnet. PolLinguaTec developed a specially adapted application, which was a modified version of WordNet Loom - a program used to edit plWordNet. Later it became clear that the new version of WordNet Loom was very useful to the The African Wordnet development team, which was looking for a tool to speed up and improve the editing of their database. The results of the cooperation between PolLinguaTec and the African Wordnet team were presented in Griesel, Bosch, Mojapelo (2019).

In relation to corpus building, Jerzy Malinowski collaborated with PolLinguaTec to construct the Corpus of Henryk Siemiradzki Paintings. In addition, Mariusz Zięba collaborated with PolLinguaTec in his work on “The search for meaning and sense of life and personal growth in the consequence of trauma: prospective studies”, which resulted in the paper Zięba, Wiecheć, Biegańska-Banaś, Mieleszczenko-Kowszewicz (2019). Finally, The PolLinguaTec team has trained researchers to conduct a study of people with post-traumatic syndrome using NLP methods, where the training and cooperation concerned such topics as the creation of attendance lists of words, sentiment analysis, and application of stylometry.

Members of PolLinguaTec and LTC

Click here to read more about Tour de CLARIN