Interview conducted by Jakob Lenardič
Olga Czeranowska is a sociologist working in the field of migrant studies. She and her team have performed sentiment analysis of Polish Twitter posts with the cooperation of CLARIN-PL.
Please introduce yourself – your academic background and current position.
I am a sociologist. I did my PhD at the University of Warsaw in the Institute of Applied Social Sciences. The topic of my thesis was occupational prestige from an individual perspective, and I was interested in how people with highly prestigious positions on the labour market feel their prestige and how it affects their professional and private biographies. Since April 2020 I have been working at the SWPS University of Social Sciences and Humanities. My research interests lie primarily in the area of the sociology of work and migration studies. In the near future, I plan to focus more on the interconnections between geographical and occupational mobility, migrants’ careers, and the concept of success in the migrants’ occupational trajectories. As for research methodologies, I have experience with both qualitative and quantitative data, as well as using a mixed-methods approach. I am very much interested in trying new methods of gathering and analysing data in order to see social realities through new lenses.
Your research is rooted in the social sciences and focuses on the field of migrant studies. What are some of the topics you are currently exploring in this field? Could you briefly present them?
We are currently analysing data obtained in the project (IT)Mobility. Immobility of the mobile, mobility of the immobile – migrants in the times of pandemics and new information/communication technologies. In this project, we are examining the effects of the COVID-19 pandemic on the different spheres of the lives of Polish migrants. The project is based on the broad understanding of mobility, including not only geographical mobility, but also virtual mobility. We are also analysing occupational mobility in connection with geographical mobility. Since the beginning of the pandemic, geographical mobility has in many ways been restricted in a way that is unprecedented in recent decades. Some of this ‘mobility energy’ has been shifted towards virtual mobility as many spheres of everyday lives have moved online. This applies not only to private lives, such as meeting family and friends via Zoom or Google Meet, but also to the work sphere and services, such as online classes, working from home, and telehealth. Many of those possibilities existed before, but we can safely assume that the pandemic was an accelerator of their development and more general use.
We assumed that migrants are a kind of extreme case for the analysis of different forms of mobility and immobility during the pandemic – not only did they move to another country, but they generally had more experience with virtual mobility as well. They were keeping in touch with their family and friends or used services like Zoom even before the pandemic, in order to participate in cultural events or use services in their country of origin. In addition, the migrants’ situation during the pandemic itself, especially with lockdowns and other restrictions, was unique, as many of them were crucially relying on mobility between their country of residence and country of origin to visit their loved ones or use.
You are currently performing sentiment analysis of Twitter posts with the cooperation of CLARIN-PL. How did you start collaborating with CLARIN-PL?
Our team took part in a CLARIN-PL online workshop on natural language processing () in November 2020 that included general information about their tools as well as some real-life examples of their application in other research projects. After the event, we contacted CLARIN-PL via their online form for researchers who need consultation. We then met with the CLARIN-PL team and discussed what would be the best way to approach our data. It was perfect timing for us, because our project started in September 2020 and at this point, within the qualitative component, we were just beginning to collect the Twitter data while simultaneously looking for the best way to perform the quantitative analysis. Using CLARIN-PL tools gave us a solid methodological foundation for this part of our project.
How are you performing the sentiment analysis? Which CLARIN-PL tools are you using for this task? How is the sentiment labelled? How are you sampling the Tweets?
For our first paper, we are using CLARIN-PL’s MultiEmo
. MultiEmo is a tool for sentiment analysis that is available in eleven different languages, including Polish. It uses a manually annotated corpus of consumer reviews as its training set and labels Tweets in terms of four sentiment values: positive, negative, ambivalent, and neutral.
We are working with two datasets. The first one consists of Tweets gathered on the basis of hashtags that were identified as being relevant for the project. The hashtags mostly relate to pandemic concepts, such as #lockdown, #stayhome, and #workfromhome, and are both in Polish and English. The second dataset consists of Tweets by users who were identified as Polish migrants. Sampling was one of the biggest challenges for our project, as we specifically wanted to access the Tweets of Polish migrants.
Unfortunately, only a small percentage of Twitter data is geotagged, so we had to find another way to filter them. Our strategy was then to gather Tweets from users whose location (given by the user and annotated manually by our team) is in a country other than Poland, but they are using Twitter in Polish. Additionally, we filtered the database to exclude bots. We are aware that this is not a perfect solution, but we think that, taking into consideration the database-related constraints (missing values, people giving fictional places as their location, etc.), it still gives us a solid sample of at least one kind of Polish migrant – a person who considers their stay abroad permanent enough to change their Twitter location, but still has some ties with the home country (for instance, such a migrant uses Polish because of who will be reading their feed) or did not learn the receiving country’s language well enough to feel comfortable using it on social media.
What does the sentiment of Tweets reveal about migrant mobility in relation to the ongoing COVID19 pandemic?
We are still at the stage of data analysis, but what we are hoping to see are some longitudinal patterns. With data covering a period that is relatively long in relation to social media (we started gathering Tweets in January 2021), we want to analyse how the Twitter discourse has changed over the last year. We will be looking both at the popularity of particular hashtags connected with the pandemic and its consequences, such as the aforementioned #lockdown and #workfromhome, and also at the sentiment connected with those hashtags. In some further steps, I hope that the two (IT)mobility datasets will also be analysed together with some project-external datasets that include the number of COVID-19 cases in particular countries, the number of people vaccinated or data on mobility restrictions in Poland or the receiving countries. This would give us an opportunity to analyse how changes in the offline world, such as the introduction of the COVID-19 vaccine, influenced the attitudes of the online Twitter community. We would also like to compare the Twitter presence of our subsample of Polish migrants with that of the general population of users, as well as – if the subsamples will turn out to be numerous enough – between the subsamples of Polish migrants in various locations.
How is CLARIN-PL supporting you in this task? Who are you collaborating with?
The CLARIN-PL team is supporting us with the use of the sentiment analysis tool MultiEmo by adding annotations to the tables in which our data is stored. Further analysis and data visualisation is carried out by our team with Power BI. We are currently in the process of preparing the first paper based on the sentiment analysis of the Polish migrants’ Tweets, with the help of Krzysztof Hwaszcz, Jan Kocoń, Piotr Miłkowski and Jan Wieczorek. We certainly hope to work together more in the future, both with the (IT)mobility dataset and within other, upcoming projects.
Why is it important to take a computational approach, such as sentiment analysis (or more broadly text analytics/natural language processing), in the social sciences?
From a practical standpoint, sentiment analysis methodologies are extremely useful in dealing with big datasets, such as social media datasets. Social media is now a crucial part of the everyday lives of people living in contemporary society, so, naturally, they are becoming a more and more important source of data for social researchers. Social media datasets are real-time reactions to important events, so they can both provide us with data relatively quickly and enable us to analyse social phenomena over time. In the case of migration studies, what is very useful is the possibility of gathering the data internationally and within the context of the various locations of the social media platforms (although this can still be difficult because of the location issues that I have mentioned before). However, this kind of data comes with its own set of challenges, such as the large size of the datasets. Luckily, with sentiment analysis tools such as MultiEmo, we are able to overcome such problems and analyse social media discourse in a standardised way.
What can CLARIN-PL do to further support digital humanities and social sciences researchers working with topics in migrant studies?
I think that further events such as the December webinar are crucial so that social researchers know that such possibilities exist and are within reach. The (IT)mobility team took part in a CLARIN-PL online conference 'CLARIN-PL-Biz – Language Technologies for Learning and Business II' in July 2021 to present our project as an example of research using CLARIN-PL tools. This conference was aimed at presenting academic and commercial use of CLARIN-PL infrastructure. I was very happy that we could contribute to this event, and I hope that our example may have inspired some other migration researchers to take a look what CLARIN-PL has to offer in terms of both tools and research support.