This showcase is an example of how language technology can be exploited in research within the humanities. The resource that this case is based on is Gesta Danorum written about 1200 by the Danish historian, Saxo. Gesta Danorum is written in High Latin and describes in 16 books the period of time from King Dan to Canute VI of Denmark. Traditionally, the work is divided into two main sections, one consisting of books 1-9 which deals with Norse mythology and a historical second part of the books 10-16 describing the introduction of Christianity in Denmark. In 1969, a competing thesis was launched cf. Skovgaard-Petersen (1969). In this analysis the composition of Gesta Danorum is split up into books 1-8 and books 9-16. These two competing interpretations can be paraphrased into the question: Is it book 9 or book 10 that represents the transition from the heathen to the Christian period in Gesta Danorum? In order to find evidence for the answer to this question, the platform with embedded linguistic information and advanced search facilities was exploited to identify subject area specific elements in the various books of Gesta Danorum and to display the search results in a manageable way.
The procedure was to take a translation of Gesta Danorum and the compute PoS and lemma information automatically. To give example of the outcome of the automatic processing, the sentence “Kongen blev kronet på slottet” (“the King was crowned at the castle”) was annotated as follows:
Kongen/konge/NN_COM_SING_DEF blev/blive/V_INDIC_PAST kronet/krone/V_PARTC_PAST på/på/PREP slottet/slot/NN_NEUT_SING_DEF
The final step was to upload the annotated version of Gesta Danorum into the IMS Open Corpus Workbenchi (open source software). This platform made it possible to make queries that exploit both the linguistic information and the Corpus Query Processing (CQP) search facilities embedded in this platform.
- Visit the website: http://cst.dk/cgi-bin/dighumlab/Saxo/form-query.pl?mode=cqp.
- Choose Run Query with the default search pattern, [pos!="RESID_SIGN"] that counts all the words in Gesta Danorum
- Choose distribution and check how many words that occur in book 8, 9, and 10 respectively.
- Insert the following search pattern, representing Christian language usage:
[lemma="helgen"] | [word="krist.*"] | [word="synd.*" & pos="N.*"] | [word="Herren"] | [word="ang(re|er)"] | [word="hellig.*"] | [word="Gud"]
- Press Run Query and then choose Distribution.
- Check the distribution of words defined as being members of the Christian register on book 8, 9, and 10.
- Does the frequency of Christian language usage differ between book 8, 9 and 10?
- Which thesis do the observations support? The traditional approach that advocates a split between book 9 and 10? Or the competing thesis that speaks in favour of dividing Gesta Danorum between book 8 and 9?