Written by Johanna Berg
This autumn, CLARIN Ambassador Johanna Berg arranged a CLARIN workshop at the Swedish parliament, focusing on the opportunities for new research on parliamentary data opened up by several ongoing infrastructure projects. The Swedish initiatives are also linked to broader European collaboration structures in the CLARIN ParlaMint project. It is important that people in charge of safeguarding the huge amounts of text data in museums and archives understand the fast and dynamic development in methods, so that they can make sure their important collections stay relevant to research.
Parliamentary data gives a good example. The Swedish records are voluminous and coherent from the 1860s. They also fall outside of copyright, and thus serve to give a good picture of what can be done with text resources under the best of circumstances. On top of that the actual debates in parliament cover a broad spectrum of topics, and the minutes may be of interest to researchers from many fields.
This CLARIN ERIC event was co-organised with the Swerik project (Swedish Riksdag 1867–2022: An Ecosystem of Linked Open Data) and the Riksdag Library. Some 40 researchers gathered, representing more than 10 universities and several disciplines connected to digital humanities: economic history, intellectual history, linguistics/NLP, media/communications, political science, speech, statistics and more.
The workshop opened with a couple of presentations from an infrastructure perspective. The Riksdag administration presented their contemporary work on open data, and how to best make it work both in-house and outside. Then we got glimpses from two ongoing projects building research infrastructure on data from the Swedish parliament: Swerik and Roll-call votes.
Måns Magnusson, assistant professor in statistics, Uppsala University, discussed the challenges in dealing with such a huge amount of data. They can be met only by machine learning, statistical processing and iterative curation on the go. Further input on the work was shared by Fredrik Mohammadi Norén, assistant professor in media and communication, Malmö University, telling us about the annotation work going on, and metadata linking to Wikidata. Jan Theorell, professor of political science, Stockholm University, is leading another independent project that will neatly complement the Swerik. He is building a complete and linked dataset of 37 000 roll-call votes in the Swedish parliament (from 1925 onwards), to open this crucial information source up for further research with digital methods.
From this starting point in infrastructure, we proceeded to look deeper into a few of the research projects making actual use of the free access to parliamentary data, and often complementing it also with other text resources.
Claes Ohlsson, associate professor in Swedish, Linnaeus University, shared findings from ongoing research into Market Language Over Time, an interdisciplinary project combining corpus linguistics with historical discourse analysis. Nina Tahmasebi, associate professor in NLP, University of Gothenburg, shared findings from Change is Key, discussing semantic language change over time as it can be detected by text-based AI. Pelle Snickars, professor of digital cultures, Lund University, shared some experiences from the Westac (Welfare State Analytics) text mining project and presented its close connections to the Swerik infrastructure. Jens Edlund, professor in speech communication, KTH - Royal Institute of Technology, works on SweTerror, together with, among others, Mats Fridlund, associate professor and deputy director of GRIDH (Gothenburg Research Infrastructure in Digital Humanities). They were both present to tell us about ongoing research into the Swedish discourses on ‘terror’, and how they have changed over time. They are using a mixed-methods approach and are building on both text and speech data.
The event was closed with an inspiring discussion on further possibilities. A recurring theme was the value of building a research infrastructure in close connection with research initiatives testing it. We also touched on the importance of building bridges between the research community and the rich sources of ‘found data’ in the sector comprising galleries, libraries, archives and museums. There is still much to be done!