You are here

CLARIN-DK presents: Teaching the teachers – an interactive workshop for the Voyant Tools

In the teaching of literary studies, digital methods are only slowly gaining ground in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learnt that neither traditional NLP tools like lemmatizers, POS-taggers, and named entity recognizers nor simple command line scripting were suitable in such teaching scenarios. This is why CLARIN-DK started to explore more high-level language technologies, such as data visualization tools that could serve as a better and easier entry point to the use of digital methodologies for non-computational researchers and teachers.

We opted for the Voyant Tools, an online environment that performs automatic text analysis with functionalities such as word frequency lists, frequency distribution plots, and KWIC displays (Figure 1). CLARIN-DK experts organized an interactive workshop where they presented the use of this environment to lecturers and researchers at the Department of Nordic Studies and Linguistics at the University of Copenhagen. The event took place on 21 November 2018 and was attended by 12 teachers and researchers.

Figure 1: the Voyant Tools

In order to tailor the tutorial to the needs of the participants, CLARIN-DK surveyed the participants in advance on the most relevant literary works to be showcased at the tutorial and on the research questions that could be investigated and discussed during the event. They opted for the novels written around the Modern Breakthrough period, an era in the Scandinavian literature at the end of the 19th century, during which naturalism replaced romanticism, which were preprocessed and uploaded to the Voyant Tools by the CLARIN-DK.

One of the questions submitted by the attendees was the following: Is it possible to see changes over time in the use of verbs like feel/think vs. verbs like see/hear. By using the Trends tool in Voyant, the CLARIN-DK experts showed that there is a relatively constant use of feel and think in this chronologically ordered corpus, while an increase in the use of the verbs see and hear can be detected (Figure 2). The emotional verbs feel/think were chosen since they are typical of the era of romanticism, while the observational verbs see/hear better characterize the era of naturalism.

Figure 2: The chronological distribution of see and hear (blue line) and feel and think (green line) for the period between 1826 and 1899 taking into account 54 novels.

Another researcher asked whether it is possible to visualize the different narrative points of view in the novels. To do this, the CLARIN-DK experts used the The ScatterPlot tool to visualise which novels are closest by taking into account simple criteria such as the use of 1st, 2nd vs. 3rd person in the novels. The Scatterplot indicated that pronouns had been used traditionally in the novels of Herman Bang, while H.C.Andersen used them in a more creative way; for instance, by directly addressing non-animate objects with a 2nd person pronoun.

Figure 3: The similarity between the novels of Herman Bang and H.C. Andersen as shown by the ScatterPlot tool.

The participants soon realized that studying text through isolated words (word forms) was limiting and there was a clear need for lemmatization. Moreover, the need for Pos-tagged texts became evident since some researchers were interested in investigating adjectives showing emotions, while other researchers were interested in analyzing events, requiring the automatic extraction of verbs.  Despite this, the Voyant Tools proved to be very illustrative and useful to get a first quantitative overview of a collection of novels and it allowed comparison of two or more novels.

As a follow up this event, the CLARIN-DK team will organize a workshop introducing corpus tools and corpus querying techniques in linguistically annotated texts for Literary Studies. The event will also showcase how automatic linguistic annotation is performed on texts from before and after the Danish orthography reform in 1948 and discuss how it is possible to circumvent problems encountered when applying NLP tools to older texts.

Blog post written by Dorte Haltrup Hansen, Costanza Navarreta, and Lene Offersgaard, edited by Darja Fišer and Jakob Lenardič.