Skip to main content

Tour de CLARIN: CLARIN-SMS

Submitted by Karina Berger on

Written by Arne Jönsson

 

The CLARIN Knowledge Centre for Swedish in a Multilingual Setting (CLARIN-SMS) is primarily directed at researchers in the Social Sciences and Humanities (SSH) and beyond with a need for analysis, annotation or data mining of Swedish or multilingual texts, and of Swedish Sign Language.

CLARIN-SMS makes resources in the form of tools for linguistic processing, as well as corpora available for research in the Humanities and Social Sciences. The resources include monolingual (mainly Swedish) and multilingual corpora across several domains, and tools for the basic processing of text, including tokenisation, morphological analysis, part-of-speech tagging, syntactic parsing, and named entity recognition.


Main Areas of Expertise

CLARIN-SMS offers special expertise:

For researchers interested in exploring Swedish texts by providing support for the creation and processing of Swedish texts with a variety of computational methods, such as linguistic annotation at different levels, or sentiment analysis.

For researchers interested in comparative analyses by providing support for the creation and processing of parallel and comparable corpora, including alignment and machine translation, as well as cross-linguistically consistent annotation within the framework of Universal Dependencies which allows for easy comparative analyses. 

For researchers interested in education and content accessibility by providing support for computation and evaluation of measures of text complexity. 

For researchers and users of Swedish Sign Language by providing support for the creation of lexicons and corpora for SSL, and annotation of SSL (including glosses, part-of-speech tagging and syntactic structure).

The support is provided by several partners participating in the CLARIN-SMS distributed Knowledge Centre:

  • Linköping University, Department of Computer and Information Science
  • Stockholm University, Department of Linguistics
  • Uppsala University, Department of Linguistics and Philology. 

Although each CLARIN-SMS node works as a separate unit and promotes its services and resources in various ways, including promotion tours at universities, with web pages presenting projects and resources, and with presentations at CLARIN-related events, the K-centre is a common resource. CLARIN-SMS is a vibrant community, and, in accordance with CLARIN’s general mission of creating and promoting language resources, a variety of activities has been carried out at the respective nodes, including tool and resource development for language analysis, both multilingual and Swedish only.
 

An Active Research Hub

A number of activities are focused especially on promoting the use of language technology in SSH. For instance, one of the projects includes analysing the development of the concept of handicapped from a Swedish parliamentary perspective. In this project, we help researchers process and analyse the Swedish Government’s official reports from early 1900 to the present day with a variety of SweClarin resources and language technology tools, such as the SPARV pipeline.

 

Example of two word clouds from the ISO adoption analysis type called Stewards. The left word cloud is from companies’ web pages with indirect economic benefits resulting from preventive communication adoption and the right from direct economic benefits.

 

Another example is the analysis of the protocols of the Swedish National Bank (Sw. Riksbanken), where we compare protocols from the period when they were anonymous to protocols from the period when they were not. One of the goals of this study is to see if we can identify individual speakers from the period of anonymous protocols. Another goal is to provide the National Bank with information about potential differences and similarities in argumentation between the two types of protocols. To this end, we use a variety of SweClarin resources, such as the sensaldo-v02 sentiment lexicon or the SPARV pipeline for parsing, in combination with, for instance, topic and sentiment analysis models.

A further example is a project that is led in cooperation with management researchers, in which we are analysing Swedish companies’ adherence and adoption of the information security standard ISO 27001. The aim of the project is to examine the communicative constitution of preventive innovation in organisations. For this project, we helped create a corpus and analyse it from multiple interdisciplinary perspectives using SweClarin tools and resources, such as the sensaldo-v02 sentiment lexicon or the SPARV pipeline for parsing, as well as other language technology tools, including word clouds.

 

Some Signature CLARIN-SMS Tools and Resources

Tools and models:

  • SWEGRAM: Aims to provide a tool for text analysis in Swedish and English. You can upload one or several texts and annotate them at different linguistic levels with morphological and syntactic information. The annotated texts can then be used to extract statistics about the text properties with respect to text length, number of words, readability measures, part-of-speech, and much more.
The SWEGRAM annotation workflow (by Beáta Megyesi).

 

 

Corpora: