Skip to main content

Tour de CLARIN: Interview with Andrea Fried and Arne Jönsson

Submitted by Karina Berger on

The conversation was led by Kristina Pahor de Maiti Tekavčič.

 

Please start by briefly introducing yourself and your research background.

Andrea Fried

 

Andrea: I am a Biträdande (Senior Associate) Professor in Business Administration, and am affiliated with Linköping University, Sweden. My areas of expertise include organisation studies, innovation research, and management control.

 

Arne Jönsson

 

Arne: I am a Professor Emeritus in computer science at Linköping University, and actively involved in the activities of the CLARIN-SMS K-centre. My research focuses on language technology, with a current emphasis on text analysis and text adaptation.

 

 

You used CLARIN resources in a study on the communication activities of Swedish companies regarding the implementation of information security standards. Can you briefly explain what the study was about?

Andrea: The study examines the theoretical underpinnings and limitations of Rogers’ concept of preventive innovation communication, using computational sentiment analysis to promote a more situational and meaning-based understanding of this concept. Preventive innovations typically call for action at one point in time to prevent an undesirable future scenario. Therefore, preventive innovations such as vaccines, occupational safety or cybersecurity measures spread slowly, partly due to the postponement of adoption rewards. As suggested in the literature, the economic benefits of preventive innovation for organisations (for example measures adopted to prevent pollution in the production cycle, protect workers’ health, or ensure information security) are mainly intangible, often time-delayed, and adopted for incidents such as cybersecurity attacks, virus infection, or work accidents that may never occur.  

Drawing on the discourse perspective of organisational communication, and using the information security standard ISO/IEC 27001 as an example of preventive innovation, my colleagues and I extended Rogers’ view, noting that preventive innovations influence, and are influenced, by the adopting organisation. Thus, there are different approaches to adopting preventive innovations: (i) using the security standard as an entrepreneurial inspiration and offering training about information security, (ii) reorganising their organisational processes accordingly, and (iii) calling auditors who certify that their work is compliant with the ISO standard. Our findings further extend Rogers’ view by providing evidence that organisations not only receive indirect recognition, but can also gain direct economic benefits from implementing preventive innovation practices. Last but not least, we show that the organisational communication of preventive innovations varies along the three adoption approaches, and also depends on whether organisations receive direct or indirect economic benefits from engaging in preventive innovation.

 

What are some practical implications of your study on the communicative practices linked to preventive innovation? 

Andrea: These findings are important for advancing the concept of preventive innovation, as well as for different stakeholders. For policymakers, the findings allow them to understand how preventive innovation spreads into organisational practice and sustains social welfare. For adopters of preventive innovations, the study provides insights into the best dissemination practices regarding preventive innovations, for example choosing the right communication channels, as well as showing that the communication content is important in organisational reality. In addition, the study highlights the potential economic benefits that are involved in the adoption of preventive innovations. Finally, for creators of preventive innovations, the study is valuable because it outlines a variety of ways in which preventive innovations are adopted, and how their adoption shapes organisational communication. This, in turn, informs the evaluation of the proposed preventive innovation.

 

What motivated you to integrate CLARIN tools into this specific study?

Andrea: I contacted the Department of Computer Science at our university because I understood that the digitization of the humanities is progressing. Specifically, I wanted to understand what computational text analysis makes possible for qualitative researchers, and where the limitations are. When I was introduced to Arne Jönsson's work and expertise, he mentioned that funding for his work on computational text analysis via SWE-CLARIN would be possible, so we decided to collaborate on this study.

Arne: Much of the CLARIN-SMS K-centre's activities include assisting researchers in the humanities and social sciences using language technology tools and resources. Andrea and I met through a member of the  SWE-CLARIN board, and during the discussions, we saw how a computational approach, in particular quantitative data analysis, could facilitate the more in-depth studies that Andrea’s research group focuses on.

 

Which specific CLARIN resources and tools have you used in your research?

Arne: We used tools and data sources offered by SWE-CLARIN such as sentiment lexicon and the Sparv pipeline to process the data. We also used a variety of publicly available tools in the study. For instance, we used a web crawler to collect the data, Google Translate, and fastText. In this study, we experimented with several techniques, such as word clouds, topic modelling, and sentiment analysis for which the preprocessing steps were performed using SWE-CLARIN tools.

 

What was the added value of complementing your methodology with computational methods?

Andrea: The first added value was the ability to access data through SWE-CLARIN that would otherwise not have been available to us. Secondly, it was a learning process to understand how these methods work, and that the most important thing is to limit the amount of data 'noise' the smaller the database becomes. In our case, we had to clean the database of the 'noise' present in the web-scraped files. We learned that it is not only about ‘big’ data, but also about the quality of the data. Finally, we obtained results showing differences in the communication behaviour of companies. Even though we were not generating ‘big data’, the amount of data was still too large for humans to handle easily.

Overview of the process (source).

What were some technical challenges with the type of data you used for this study and how did you solve them?

Arne: Web data can be quite messy, for instance, Swedish websites may contain English text, or navigation information, lists with no punctuation, various links, etc. Partly we solved this by filtering out English text, very long sentences, and duplicates. But the data is still not perfect. At the same time, we can assume that the messiness is similar across analysis categories and thus does not affect comparison, which is what we mainly do.

 

Interdisciplinary work is certainly inspiring, but combining different terminologies, methodological approaches, etc. can be very challenging. Do you have any advice that might be valuable when starting an interdisciplinary study? 

Andrea: My advice would be to take the time to become familiar with each other's scientific language and publication traditions, and to check whether the editorial boards of scientific journals have the expertise to assess the quality of studies using methods such as computational text analysis. This is crucial for a successful publication, because editorial boards, reviewers of scientific journals, and colleagues in the social sciences and humanities are not yet familiar with CLARIN tools and methods. Unfortunately, this unfamiliarity can lead to rejections, which happened to us at least once.

Arne: I have found that it is helpful to use examples when discussing what CLARIN can offer for researchers in social sciences. Looking at early results, not yet necessarily perfect (e.g., with no lemmatisation performed on data), served as a starting point of discussion, and it helped us decide what analyses to conduct.

 

Do you have experience with, or plans, to use any other CLARIN tools, resources, services, or initiatives?

Andrea: I would definitely use CLARIN resources or CLARIN-related expertise again if possible. This research collaboration was a valuable learning experience and helped us explore communication research topics in a new way. Previously, as a social science researcher, I was not aware of the advanced level of computational text analysis capabilities achieved at the Department of Computer and Information Science at Linköping University.

 

How do you see the role of researchers, especially those from the Social Sciences and Humanities, in supporting the CLARIN’s mission 'to contribute to the innovation potential of the advanced models for interaction between people, data, and tools'?

Andrea: As researchers in the social sciences and humanities, we play a key role in assessing the data input and analysis output of computational text analysis. We act as quality and relevance verifiers: we are sensemakers so to speak. But the active participation of all parties involved necessitates active knowledge transfer. This study was a wonderful opportunity for us to learn about new methodological developments. For instance, without the possibility of integrating CLARIN resources, it might not have been possible for Arne Jönsson to spare so much time and contribute to our study. However, more importantly, it would have been impossible for the rest of the team from the social sciences and humanities to explore all the latest tools in the field of computational text analysis and find ways to include them in the study.

 

Where do you see challenges in using CLARIN resources for research? How would you like CLARIN to evolve?

Andrea: At this stage, the expertise in CLARIN tools and possibilities can mainly be found among CLARIN researchers in the Department of Computer Science at Linköping University. Some researchers in the social sciences and humanities might be reluctant to try out CLARIN collaborations due to a general scepticism regarding AI, a lack of knowledge of CLARIN, as well as a lack of understanding of the type of data that lends itself to computational text analysis. I would like to see CLARIN become an important hub for bridging disciplines beyond computer science, thus broadening their methodological canon.