Blog post written by Rickard Domeij, edited by Darja Fišer and Jakob Lenardič
CLARIN Knowledge Centre for the Languages of Sweden
The SWELANG knowledge centre is an information service offering advice on the use of digital language resources and tools for Swedish and other languages spoken in Sweden, as well as other parts of the intangible cultural heritage of Sweden.
The centre is based at the Language Council of Sweden (Stockholm) and is run in cooperation with the other sections of the Institute of Language and Folklore (ISOF) in Uppsala and Gothenburg. The institute is sanctioned by the Swedish government to collect, preserve, process and disseminate scientific knowledge and material concerning the Swedish language, the national minority languages, the Swedish sign language and Swedish dialects, as well as other parts of the intangible cultural heritage of Sweden.
Development of Digital Tools and Services
The SWELANG knowledge centre cooperates closely with SWE-CLARIN and the National Language Bank of Sweden (Nationella språkbanken). The knowledge centre focuses on developing methods for collecting two types of data:
- Official texts and terminology for research in official communication and social conditions. The material is multilingual with parallel texts in Swedish and translations into easy‐to‐read, plain language of the five national minority languages (Finnish, Sami, Romani, Yiddish and Meänkieli), as well as other minority languages used in official communication.
- Folk narratives, as well as other text and speech material from the dialect and folklore archives. The material consists of inventories, dialect word databases, letters, recordings, transcriptions, etc. It is important both in terms of content and linguistic quality, as it includes a large number of geographical, social, and stylistic varieties.
In addition, the centre is developing methods to manage and make widely available contextualized digital archive material through a map-based research interface called Digitalt kulturarv (Digital Cultural Heritage). The interface is connected to a database of 16,000 complete records. Apart from text material, consisting of transcribed records that were scanned using OCR or HTR, the database also contains metadata, such as year of recording, categories and location, as well as information about the person recording and informants (i.e. name, year of birth, gender). The interface shows not only a list with search results, it also visualizes statistics from the metadata. For example, a map illustrates the geographical distribution of the records. A limited public version of Digitalt kulturarv called Sägenkartan (Map of Legends) can be accessed on the web (in Swedish only). A log-in version with richer content for researchers is on its way.
The knowledge centre is also developing an infrastructure for dictionaries the aim of which is to store and make available official terminology and dialect words in collaboration with Språkbanken (a CLARIN B centre). Resources offered by the SWELANG Knowledge centre are already available through the SWE-CLARIN catalogue. This mostly includes bilingual dictionaries that pair Swedish with other languages spoken in the country, such as the Swedish-Bosnian dictionary and the Swedish-Azerbaijani dictionary.
Interdisciplinary collaboration within the Tilltal project
In the Tilltal project, we examine how speech and language technology methods can make the historical speech recordings more accessible for research in cooperation with data holders, researchers and speech and language technologists. For instance, there are immense amounts of recorded interviews which currently have to be played in real time in order to be analysed. These materials conceal a wealth of information of great interest for the humanities and social sciences.
With digital tools we see possibilities to explore the recordings in new ways. For example, we are exploring methods to visualize and browse large amounts of audio data together with the CLARIN Knowledge centre of Speech Analysis at KTH (Malisz et al. 2017). This is done by projecting sound segments on a two-dimensional plane with a technique used to find similarities in images, so that representations of similar sounds are clustered together. We hope that this will make it possible to find interesting features in audio files without actually listening to them one by one, for example to identify applause and singing from speech, or even find similar vowel pronunciations. Our archives also include a wide range of information in written form, including descriptions of recording situations and manual transcripts, which we use to provide further pathways into the speech materials (Domeij et al. 2019).
Associated project collaborations
The K-Centre is part of the following national and international language infrastructure collaborations:
CLARIN — the European research infrastructure for language resources and technology
- ELRC — European Language Resource Coordination
- eTranslation TermBank — collection and provision of terminological resources for machine translation within the EU
- META-NET , a Network of Excellence consisting of 60 research centres from 34 countries, is dedicated to building the technological foundations of a multilingual European information society
- Tilltal project
Collection of resources
List of online databases at ISOF (in Swedish only)
Sägenkartan – Map of legends (in Swedish only)
Selected publications and presentations
Borin, Lars, Forsberg, Markus, Edlund, Jens & Domeij, Rickard. 2018. Språkbanken 2018: Research Resources for Text, Speech, & Society. Poster DHN I: Mäkelä, Eetu, Tolonen, Mikko & Tuominen, Jouni (eds.) Digital Humanities in the Nordic Countries 3rd Confer-ence. 504–506. Retrieved from: http://ceur-ws.org/Vol-2084/poster7.pdf
- Berg, Johanna, Domeij, Rickard, Edlund, Jens, Eriksson, Gunnar, House, David, Malisz, Zofia, Nylund Skog, Susanne & Öqvist, Jenny. 2016. Tilltal – making cultural heritage accessible for speech research. Paper presented at CLARIN Annual Conference 26–28 October 2016, Aix-en-Provence, France.
- Berg, Johanna, Domeij, Rickard, Edlund, Jens, Eriksson, Gunnar, House, David, Malisz, Zofia, Nylund Skog, Susanne & Öqvist, Jenny. 2017. Involving users and collaborating between disciplines in making cultural heritage accessible for research. Paper presented at CLARIN Annual Conference 18–20 September 2017, Budapest, Hungary.
- Dagsson, Trausti & Skott, Fredrik. 2018. Digital Cultural Heritage — a Digital Folklore Archive [Blog post]. Retrieved from: https://sweclarin.se/eng/digital-cultural-heritage-%E2%80%94-digital-fo…
- Domeij, Rickard. & Eriksson, Gunnar. 2018. Språkbanken Sam. A Clarin knowledge center for the languages of Sweden. Poster presented at SLTC 2018 , 20-22 November at Stockholm University.
- Domeij, Rickard, Eriksson, Gunnar, Lindström, Eva, Magnusson Petzell, Erik, Nylund Skog, Susanne, Skott, Fredrik, Öqvist, Jenny. 2019. Text as an entry point to speech – a journey into the most inaccessible areas of the archives. Book of abstracts 4th Conference of The Association Digital Humanities in the Nordic Countries Copenhagen, March 6-8 2019.
- Nylund Skog, Susanne 2018. From personal letters to scientific knowledge: The creation of archived records in a tradition archive. In: Visions and Traditions: Knowledge Production and Tradition Archives. Lauri Harvilahti, Audun Kjus, Cliona O’Carroll, Susanne Öster-lund-Pötzsch, Fredrik Skott and Rita Treija (eds.) Helsinki: Academia Scientiarum Fennica, FFC 315.
- Malisz, Zofia, Öqvist, Jenny, Fallgren, Per, Edlund, Jens & House, David. 2017. Visualising vocalic variability in space and time – automatic exploration of “found data”. Paper presenterat vid 47th Poznań Linguistic Meeting, 18–20 September 2017, Adam Mickiewicz University, Poznań, Polen.
Click here to read more about Tour de CLARIN