You are here

Tour de CLARIN: Hungary


The national CLARIN consortium for Hungary, HunCLARIN, joined CLARIN ERIC in 2016. The Research Institute for Linguistics was one of the founding partners of CLARIN and took an active role in the preparatory phase of the history of CLARIN. The members of the consortium are the Research Institute for Linguistics, the MOKK Centre for Media Research and Education and the Speech Communication and Smart Interactions Laboratories of the Budapest University of Technology and Economics, the University of Szeged, the University of Debrecen, the Pázmány Péter Catholic University, the MorphoLogic LLC, the Institute for Computer Science and Control, and the Institute of Cognitive Neuroscience and Psychology. The national coordinator for HunCLARIN is Tamás Váradi.

As can be seen from the above list, the consortium covers a wide range of complementary expertise and research interests. It represents most of the leading research centres in Hungarian language and speech technology, which have closely cooperated in various national and international projects for more than a decade. 

The resources developed by HunCLARIN members include corpora that are indispensable to research in the use of the Hungarian language such as the Hungarian National Corpus, which has recently been upscaled to giga size, the Hungarian WebCorpus, which was the first of its kind in Hungarian, and the Szeged Treebank, the reference treebank for Hungarian. Bilingual resources include the Hunglish Corpus, a sentence-aligned Hungarian-English parallel corpus of about 120 million words in 4 million sentence pairs. A truly unique resource is HuComTech Corpus, a large scale multimodal corpus which offers a rich dataset on 47 annotation levels and was presented to the CLARIN community at the CLARIN 2018 Conference.

As regards tools, the Hun* set of tools developed by the MOKK Centre (such as HunAlign, HunTag, HunMorph, etc.) has acquired recognition beyond Hungary for its versatility and free availability for languages other than Hungarian as well. A major recent achievement is the comprehensive processing chain e-magyar, which was developed through widespread collaboration within HunCLARIN members. This open and modular toolset was developed to suit the needs of digital humanities researchers and application developers alike and is therefore available both as a web service and for download from GitHub repositories in source. 

Severely limited by lack of funding for national activities, HunCLARIN, nevertheless, is making successful efforts to reach out to the humanities and social science communities. It has established cooperation with the Centre for Digital Humanities at Eötvös Loránd University as well as the Centre for Social Sciences. Last year HunCLARIN embarked on a road-show among Hungarian universities showcasing central HunCLARIN tools and resources as well as local research projects. The three events so far at the universities of Szeged, Debrecen and Pécs have proved so popular that a second event is already being organised this autumn at Szeged University at their request. 

In 2017 HunCLARIN hosted the CLARIN Annual conference in Budapest. In the future, HunCLARIN  plans to establish a K-Centre for Hungarian, continue with our outreach efforts and subject to securing some national funding, set-up and operate a B-Centre as well.


The HunCLARIN team


Blog post written by Tamás Váradi, edited by Darja Fišer.