Tour de CLARIN: CLARIAH-AT Presents Tool Chain for Sentiment Analysis


Written by Martina Scholger

Thanks to CLARIAH-AT project funding, dedicated sentiment dictionaries and a freely available tool chain for conducting sentiment analysis on Italian, French, and Spanish Spectators periodicals from the digital scholarly edition The Spectators in the International Context were developed.

The Spectator press is a journalistic-literary genre of the 18th century, popularising enlightened ideas among a non-academic audience in an entertaining way. The transmission of values, social norms, as well as positive and negative behavioral patterns and (character) traits were communicated through emotional engagement with the audience. This makes the study of the material by means of sentiment analysis particularly interesting.

Sentiment words highlighting.

Initial experiments in the project Distant Spectators. Distant Reading for Periodicals of the Enlightenment have shown that freely available sentiment dictionaries and tools are especially tailored to modern languages, mostly English, and widely used in the context of social media analysis. Therefore they are not suitable for 18th century literary texts that show, for example, orthographic variances and shifts in meaning.

Hence, the project created its own sentiment dictionaries. For this purpose, a certain number of seed words were selected from the corpus and then manually annotated by experts with respect to their polarities (positive, negative, neutral). Based on these seed words, word embeddings were trained and a machine learning classifier was used to transfer the sentiment score to other words occurring in a similar context. In this way, the list of seed words was expanded using computational methods and the time-consuming manual annotation process was shortened.

In addition to the introduction of reusable sentiment dictionaries, a freely and publicly available tool chain based on Jupyter Notebooks was developed, enabling researchers to apply 1) the dictionary creation process and 2) the actual sentiment analysis methods (i.e., import of dictionaries, computation of sentiment, data preparation, and various visualisations) to their own material. The notebook contains executable code as well as tutorial-style introductions to concepts such as word embeddings, k-nearest neighbor classification and dictionary-based sentiment analysis. Thus, the notebooks can be used in teaching and training, but also as a basic framework for researchers in the field, who can replace certain components with more sophisticated methods as needed.

Overview of the Dictionary Creation Pipeline.

The project was supplemented by a three-day online workshop on ‘Sentiment Analysis for Literary Studies‘, which introduced the methods and the developed tool chain. The workshop materials, including slides, exercises, and videos of the evening lectures, are available at the DiSpecs website. The dictionaries and the tool chain were published in the new publication format of a code experiment in Melusina Press as part of the virtual DHd 2021 conference.