Natural Language Processing

Tour de CLARIN: Interview with Sidsel Boldsen

12 April 2022

Sidsel Boldsen is a PhD Student in Natural Language Processing ( ) and digital humanities, with a special interest in historical languages and linguistic knowledge representation. She has successfully collaborated wIth the Danish CLARIN K-Centre DANSK.

Xenophobia on Greek Twitter during and after the Financial Crisis

11 January 2022
The Project, The project presents a replication of a data-driven and linguistically inspired verbal aggression analysis framework that was designed to examine verb
‘This study is an example of how a language technology-based method can be used as a complementary research instrument in order to address broader soc, btn-arrow-circle, image-right
Methodology, The methodology that was initially designed and applied to 2013-2016 Twitter data as part of the XENO@GR project was reapplied to 2019 Twitter data in
‘This information is useful for researchers, such as political and social scientists, journalists, and, given the high correlation between physical an, btn-arrow-circle, image-right
Outcome, During the first study (2013-2016), the most discussed groups in the Twitter collections were refugees and Germans, reflecting the ongoing refugee cri
Publications and Future Plans, The project team is currently working on the extension of the framework to other targets and domains through two case studies in the context of the SS
Views on CLARIN, ‘The natural language processing tools and workflows that you can build are extremely useful for several semantic annotation analysis tasks. And in ge
Maria Pontiki, PhD, Scientific Associate at the Institute for Language and Speech Processing, Athena Research Center, Athens, Greece Maria Gavriili
Access the ILSP suite of NLP tools for Greek via CLARIN:EL:, btn-arrow-circle

plWordNet 3.0 – Słowosieć 3.0

plWordNet 3.0 – Słowosieć 3.0

plWordNet is a lexico-semantic network which reflects the lexical system of the Polish language. plWN currently contains 178 000 nouns, verbs, adjectives, and adverbs, 259 000 word senses, and over 600 000 relations and 240 000 inter-lingual relations between lexical units. It is now the largest wordnet in the world and is still growing.

Senses in plWordNet are interconnected by relations. In the resulting network, each word is defined implicitly in reference to other words. For example, samochód 'car' is a kind of pojazd drogowy 'road vehicle'; it is a whole consisting of silnik 'engine', spryskiwacz 'windscreen washer', podwozie 'chassis' and so on; its close counterpart is the colloquial fura 'wheels'.

Among plWordNet's numerous applications there is its use as a Polish-English and English-Polish dictionary -- the effect of mapping onto Princeton WordNet (the first and for many years the largest wordnet in the world). plWordNet is also an important resource in natural language processing and in artificial intelligence research. For example, it is used by Google Translate for the purposes of machine translation.

The University has made plWordNet available free of charge for all applications, including commercial ones, on a licence modelled on the Princeton WordNet licence. Users may browse plWordNet via mobile version and via WordNetLoom-Viewer (application enabling display of plWN entries), as well as download source files. Programmers may access plWordNet via Web service.

We provide (currently only in download version) 31 000 lexical units marked with their sentiment values: positive, negative, ambiguous or neutral.

 

CLARIN Centre
CLARIN-PL
Project leader
dr. Maciej Piasecki
Contact email
Acknowledgements

Wroclaw University of Technology, Ministry of Science and Higher Education (Poland)

Lärka (English LARK) - Language Acquisition Reusing Korp

Lärka 

Lärka - “LÄR språket via KorpusAnalys” - with its English equivalent “Lark” (Language Acquisition Reusing Korp) is the ICALL platform of Språkbanken (the Swedish Language Bank). ICALL – Intelligent Computer-Assisted Language Learning – has as its main aim to draw on the opportunities offered by language resources, such as corpora, lexicons and natural language processing ( ) components including lemmatizers, parsers, etc., to build more sophisticated and flexible applications for language learners and students of grammatical theory.

The work on Lärka started in the project ‘Systems Architecture for ICALL’ financed by NordPlus Sprog from2011 to 2013. Specified as a modular web-based exercise generator that reuses available annotated corpora and lexical resources, Lärka is freely available, targeting primarily learners of Swedish as a second/foreign language and students of Swedish linguistics. Being web-based, Lärka has advantages of accessibility and ease of use.

Lärka is designed as a Service Oriented Architecture based on web services. The platform comprises two main components – user interface and web services – where the web services can be reused by other applications. Web services take care of exercise generation whereas the user interface collects user input, formats the web service output, and assigns behavior to buttons and menus.

At the moment Lärka offers exercises for two target groups: students of linguistics and learners of Swedish*. Students of Linguistics can train parts of speech, syntactic relations and semantic roles, whereas second language learners of Swedish can train spelling, vocabulary and inflection patters. Available exercises share some common features, namely:

  • Training context: sentence. The objective with the Lärka-based exercise generator has, from the onset, been to use real-life language examples from corpora. Possible copyright issues are avoided by using only a single-sentence context. We are actively searching for alternatives for working with full texts.
  • Reference materials. Relevant articles are looked up in Wikipedia, Wiktionary and Karp, while a text-to-speech module provided by SitePal offers pronunciation of relevant words and sentences. Reference materials are shown in a separate field that can be hidden when not wanted.
  • Training modes: self-study, test and timed test. The self-study mode reveals all clues (e.g. reference articles, syntactic tree structure, pronunciation, etc.) and also provides a possibility to try several answer options. In the test modes, the clues are not revealed until the answer is provided; and users cannot change their answer.
  • Feedback is offered in the form of immediate correct/incorrect symbols and a result tracker where information on correct/total number of answers is shown.

Recently, text assessment function has been added to Lärka, where reading comprehension texts alternatively learner essays can be tested for their CEFR level, i.e. a level of language proficiency according to Common European Framework of Reference (A1, A2, B1, B2, C1, C2).

There is ongoing work on diagnostic testing and learner modeling.

* Previous version of Lärka is being migrated to new technology, and the newer version does not yet offer all functionalities compared to its predecessor.

 

CLARIN Centre
SWE-CLARIN (Språkbanken)
Project leader
Elena Volodina, Lars Borin, Markus Forsberg
Acknowledgements

Språkbanken (UGOT), CLT (UGOT), Department of Swedish (UGOT), Lars Borin, Markus Forsberg, Jonatan Uppström, Ildikó Pilán, David Alfter