Tour de CLARIN: Resource from Austria - ABaC:us Corpus

Submitted by on

Blog post written by Darja Fišer and Jakob Lenardič

ABaC:us Corpus

The Austrian Baroque Corpus (ABaC:us) is a digital collection of printed texts from the Baroque era, with the bulk of the data from the period between 1650 and 1750. Since 2015, the core corpus has been freely available through the ABaC:us web application, which is provided by the Austrian Centre for Digital Humanities and serves as the first corpus-based application for viewing well-documented language data from the Baroque period. The texts within the collection are predominantly characterized by religious topics and include morality lectures by Abraham a Sancta Clara, who was one of the most successful preachers in the German-speaking area in the 17th century.




Figure 1: The ABaC:us web application


The collection consists of 200,000 tokens. Its rather smaller size is due to the fact that computer-generated annotations of Baroque texts, whose stylised orthography and variations in spelling cause a large amount of mismatches by extant taggers, requires copious amounts of additional manual editing. However, the data that are part of the corpus are very richly-annotated with markup applied to chapters, headings, paragraphs, and named-entities. Apart from PoS-tagging, the corpus is annotated with lemma information, which means that each word form is linked to its base form. Lemma information is a crucial part of the corpus, as it enables researchers to easily identify all occurrences of a word despite the existence of many competing spelling variants and inflected forms.

Because of ABaC:us, scholars are for the first time able to explore vocabulary as well as linguistic structures of the works attributed to Abraham a Sancta Clara in a corpus-based approach. Moreover, the detailed linguistic annotation allows for unbiased research of Sancta Clara, who is portrayed as a linguistically-talented writer in literary history.

The ABaC:us project has broken new ground since it allows scholars to combine methods from historical studies—whether they be literary, theological or purely historical—with a corpus-based approach that enables access to richly-annotated data. Indeed, reactions from literary scholars, (computer) linguists, religious scholars and historians have shown how interdisciplinary the interest in ABaC:us is and how many different fields of research across the Digital Humanities hope to benefit from the free availability of the enriched source.





Figure 2: Tweets on ABaC:us by humanities scholars


The following is a selection of recently published papers on the ABaC:us corpus:

Claudia Resch: »Etwas für alle« – Ausgewählte Texte von und mit Abraham a Sancta Clara digital. In Zeitschrift für digitale Geisteswissenschaften 2017.

Claudia Resch and Ulrike Czeitschner: Morphosyntaktische Annotation historischer deutscher Texte: Das Austrian Baroque Corpus. In Digitale Methoden der Korpusforschung in Österreich (= Veröffentlichungen zur Linguistik und Kommunikationsforschung Nr. 30). Vienna: Verlag der Österreichischen Akademie der Wissenschaften 2017, pp. 39-62.

Claudia Resch, Ulrike Czeitschner, Eva Wohlfarter, and Barbara Krautgartner: Introducing the Austrian Baroque Corpus: Annotation and Application of a Thematic Research Collection. In Proceedings of the Third Conference on Digital Humanities in Luxembourg with a Special Focus on Reading Historical Sources in the Digital Age. Aachen 2016.

Claudia Resch and Wolfgang U. Dressler: Zur Pragmatik der Diminutive in frühen Erbauungstexten Abraham a Sancta Claras. Eine korpusbasierte Studie. In Linguistische Pragmatik in historischen Bezügen. Berlin/Boston: de Gruyter 2016, pp. 235-249.


Click here to read more about Tour de CLARIN