Tour de CLARIN: Interview with Dominika Hadro

Submitted by Jakob Lenardič on 22 January 2020

Dominika Hadro is Assistant Professor at Wroclaw University of Economics and Business. She has succesfully collaborated with the PolLinguaTec CLARIN Knowledge Centre whose experts have helped her apply topic modelling in her recent research on corporate finance and accounting. The interview was conducted via e-mail.


1. Please describe your academic background and current position.

I received my PhD in Economics in 2010 at Wroclaw University of Economics and Business and my work now mainly deals with corporate finance and accounting. In my research, I generally take a qualitative as well as quantitative approach to the analysis of financial reporting. I also study general disclosure and communication practices aimed at stakeholders of publicly listed companies and universities. I am currently an Assistant Professor at Wroclaw University of Economics and Business, but collaborate with a lot with other universities as well. I was a research fellow at Bocconi University and University of Manchester and visiting researcher at University of Bologna, University of Navarra and Pablo de Olavide University.

2. How did you get involved with the Polish PolLinguaTec CLARIN Knowledge Centre?

Since 2015, I have been the coordinator of the research project Transparency of listed companies. At the beginning of the project, I was interested in finding language tools with which my teammates and I could perform a more automatic textual analysis of financial reports in Polish. Because I received my Master’s degree at Wroclaw University of Technology, I knew that there was a research group working on machine learning. I was checking the profiles of researches involved in the group, and I found Maciej Piasecki and the Polish PolLinguaTec CLARIN Knowledge Centre. I contacted him and we immediately started collaborating and using their tools in this and related research projects.

3. Could you briefly describe the goals and results of this research project?

This project is divided into two research streams. In the first stream, my colleagues and I study impression management (discursive strategies used by company owners to influence stakeholders’ impressions) and we apply content analysis to identify which impression management techniques are used in letters to shareholders. The letters are written in Polish by the 60 largest companies listed at the Warsaw Stock Exchange in 2008 and 2013. We identify patterns of these techniques with the use of k-means clustering, a statistical method that helps us to group the letters into four different categories based on the types of impression management techniques that prevail.

One of the main findings, which are presented in this paper, is that the more concentrated the ownership of a company is, the shorter the letters are, which indicates that management invest less effort in communicating with investors. This is particularly visible in companies held by insiders (that is, directors who own more than 10% of a company’s voting shares), who tend to produce short, formal letters, devoid of impression management techniques. In contrast, companies controlled by foreign shareholders prepare letters that are longer and are more likely to present defensive arguments, while institutional non-controlling shareholders favour extensive disclosure. The results show that the largest cluster includes letters that praise the management, while the rest include defensive arguments, discussions of negative outcomes and short, formal letters.

The second stream focuses on the use of tone. We examine letters to shareholders as an example of a mandatory textual disclosure in which managers have the freedom of choosing the tone. Our goal is to develop a model that ties their choices to situational incentives (e.g. how attractive a given company is for specific investors) and subsequently test the model with empirical data. The results of our analysis show that managers are on average sincere in their use of tone and that tone is correlated with the company performance.

4. How do PolLinguaTec experts help you with regards to the content analysis of the letters and related materials? Which tools, resources or services offered by PolLinguaTec do you use or plan to use w.r.t to the impression-management analysis? What are the advantages of the collaboration?

Before we started collaborating with PolLinguaTec, we had to annotate the textual features required for our content analysis of the impression management techniques by hand. This manual approach required that each text be hand-coded by two individuals. Because we wanted to have as few mistakes as possible, the manual annotation process was extremely time-consuming since we were annotating for many variables. Now, thanks to PolLinguaTec, we are able to use their automatic stylometric and topic modelling tools such as WebSty, Topic, and LEM, which have not only significantly simplified and streamlined our annotation process, but have also managed to significantly increase the number of texts we can analyse, jumping from around 200 pages of text, which was our limit when we were annotating by hand, to basically an infinite number.

The PolLinguaTec tools are great because they allow us to detect many latent textual features which would go unnoticed in manual annotation, so their tools have proven to be crucial for increasing the validity of our research from a qualitative perspective. The PolLinguaTools can also be easily customized to a very specific task at hand; for instance, LEM allows us to add a readability index to our texts, which helps us ascertain the complexity of the financial statements and the degree of voluntary disclosure. Similarly, the tool Topic allows us to perform topic analysis on our corpus almost instantaneously, whereas in our previous manual approach we had to do topic analysis in several time-consuming steps.

5. Has collaboration with PolLinguaTec helped you advance the state-of-the-art in your discipline?

Together with PolLinguaTec, we are currently developing a tool for analyzing the tone/sentiment of financial text in Polish, so in this sense our current collaboration is highly relevant for our work in the project on impression management techniques. This sentiment analysis tool is based on a massively overhauled version of the Loughran-McDonald word list, which is an English sentiment wordlist that is widely recognised as the best resource to measure tone in finance and accounting.

We have identified three major areas for improvement of the Loughran-McDonald method. First, the method is weakly supported from a theoretical perspective – although it has proven itself to be empirically effective, it was first and foremost developed with the pragmatic goal in mind to obtain the best results in empirical research, so theoretical considerations were largely disregarded. Researchers apply these standard empirical tools repeatedly, but they cannot tackle key questions about the nature of the textual characteristics that they want to measure. Second and relatedly, the lack of a theory limits the Loughran-McDonald method to one language. With our tool, we want to remedy this linguistic limitation by combining the Loughran-McDonald list with the Princeton WordNet, which provides data that are significantly more conducive to multilingual applicability. Third, the Loughran-McDonald wordlist relies on the bag-of-words approach, which has proven itself to be severely limited in the case of sentiment analysis in finance. This is because wordlists inherently provide noisy measures, so we have to disregard a lot of data if we want to achieve good results in our field.

Furthermore, it is now recognized that sentiment analysis should be more driven by dynamic machine-learning methods and approaches rooted in artificial intelligence rather than on the basis of sentiment lexica, which are in essence static wordlists. This sort of AI-driven approach is exactly what we aim to achieve with the tool that we are developing together with PolLinguaTec – one of the main advantages in this sense will be that the tool is going to take context and discursive features of the text into consideration in order to improve accuracy, which is in line with the current research trends in computational linguistics.

6. Are there any other on-going projects that also involve collaboration with PolLinguaTec that you would like to highlight?

Yes, my colleagues and I are also collaborating with the K-Centre on three other projects. In the first one, we are performing textual analysis of non-financial information of publicly listed companies from an ethical perspective. We are using PolLinguaTec’s topic modelling tools like Topic to analyse the companies’ annual reports to determine if and to what extent the companies engage in the Corporate Social Responsibility (CSR) model. With topic modelling, we try to unveil textual patterns that link the company’s CSR strategies to broader ethical issues and trends.

In the second project, we are also using PolLinguaTec’s topic modelling tools but this time to perform textual analysis of sustainability reporting in public universities. We are trying to determine the most prevalent topics that university admins present in their sustainability reports as well as what influences the structure of the reports.

The third and most recent project, which is currently in its initial stages, has to do with textual analysis of the differences between annual financial reports before and after the change from cash to accrual accounting at public universities. Specifically, we want to determine how the universities’ communication with stakeholders has changed as a result of this switch to accrual accounting. To this end, we plan to use PolLinguaTec’s tools to look for textual characteristics like the differences in readability and types of topics between the Italian universities. Later on, we want to extend our scope beyond Italy and look at the situation in other countries as well, and possibly also look at institutions other than universities.

7. What is in store for your future collaboration with PolLinguaTec?

After the development of the sentiment analysis tool is completed, we would like to extend its functionality so that it could be used to identify other discursive strategies in finance. Together with Walter Aerts from the Antwerp Management School, my team and I are mainly interested in having a tool that can be used to automatically analyse attributional framing, which refers to the framing techniques with which managers explain the performance of a company in a financial report. Research on this topic generally shows that good performance is explained as the result of good management, while bad performance is often attributed to external factors. However, attributional framing can vary depending on company culture and the individual incentives of a particular manager, such as the manager’s salary and whether the manager is also the owner of the company. To the best of my knowledge, computational tools that can code attributional framing do not exist yet, so I hope that we can fill in this technological gap together with PolLinguaTec.

Click here to read more about Tour de CLARIN