Skip to main content

Tour de CLARIN: Finland

Submitted by Linda Stokman on

'Tour de CLARIN: Finland' blog post written by Darja Fišer and Jakob Lenardič 


The Finnish national consortium FIN-CLARIN has been a CLARIN member since 2015. The members of the consortium are the University of Helsinki, the University of Eastern Finland, the University of Jyväskylä, the University of Oulu, the University of Tampere, the University of Turku, the University of Vaasa, the Institute for the Languages of Finland, the Helsinki Institute of Technology and the IT Center for Science (CSC). The national coordinator for FIN-CLARIN is Research Director Krister Lindén.

FIN-CLARIN has been actively engaged in developing tools and resources that have become a staple of Finnish researchers working with language data. Through the Language Bank of Finland (Kielipankki), which is a certified CLARIN B-Centre, researchers can access dozens of Finnish corpora, which are in most cases available through online interfaces such as KORP.

A flagship resource provided by the Finnish consortium is the The Suomi 24 Sentences Corpus, a corpus that compiles texts from discussion forums of the extremely popular Suomi24 online networking website. The data from the corpus is currently being analyzed in the framework of the Citizen Mindscape project, which seeks to uncover “trends and shifts in attitudes in connection to societal phenomena” in Finland, thus making the corpus an extremely important resource that highlights how corpus-based linguistics can very well lead to a greater understanding of society at large.

The Finnish consortium is actively engaged with ground-breaking researchers working in Digital Humanities and Social Sciences who make use of the consortium’s resources and tools. The Language Bank hosts a “Researcher of the Month” archive, intended to highlight both the work of the prominent researchers and the tools and resources of potential use to researchers. In April, the researcher of the month is Professor Jarmo Jantunen, who has worked with the above-mentioned Suomi 24 Sentences Corpus in relation to Queer Linguistics.

In 2016, Finland organised 22 User Involvement events. A very successful event was the Roadshow organised to celebrate Kielipankki’s 20th anniversary. It consisted of a series of seminars at all the member organizations of the FIN-CLARIN consortium.

Four language tools hosted by the Finnish Language Bank are publicly available. These are Finnish Parse, which is a powerful dependency parser that is capable of tokenisation, sentence splitting, morpho-syntactic tagging and parsing and can be applied to plain Finnish text with extremely high accuracy; Aalto-ASR, a  continuous speech recognizer that can handle a large amount of Finnish vocabulary; the Helsinki Finite-State Transducer Technology that provides software for morphological analyses of various European languages, and finally, the Proto-Indo-European Lexicon, which acts as a generative etymological dictionary, providing data on word origins and historical changes for the hundred most ancient Indo-European languages.

 

FIN-CLARIN Team (rows from left to right):
Back row: Atro Voutilainen, Senka Drobac, Pekka Kauppinen; middle row: Maria Palolahti, Erik Axelson, Jyrki Niemi; front row: Krister Lindén (Research Director), Mietta Lennes, Jussi Piitulainen; not in the photo: Tero Aalto, Imre Bartis, Ute Dieckmann, Sam Hardwick, Martin Matthiesen.

 


Click here to read more about Tour de CLARIN