Tour de CLARIN: Interview with Amelie Dorn

Please introduce yourself. What is your background? What is your current position?

I have a background in philology, pedagogy and linguistics. After completing my PhD in Ireland, I held a postdoc position at the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH) at the Austrian Academy of Sciences in Vienna. I was involved in different Digital Humanities projects that have evolved around historic language collections. Right now, I'm a researcher and lecturer for the research programme ‘German in Austria’ (‘Deutsch in Österreich’(DiÖ)), and I work on lexical variation.

What is the ‘Deutsch in Österreich’ project about? What is your role in the project?

The project ‘German in Austria. Variation – Contact – Perception’ is a Special Research Programme (SFB), which is funded by the Austrian Science Fund (FWF). It started in 2016 and is headed and led by Alexandra N. Lenz, Professor of Linguistics at the University of Vienna and Director of the ACDH-CH at the Austrian Academy of Sciences. The project comprises six institutes at four different research institutions across Austria, including the University of Vienna, the University of Graz, the University of Salzburg, and the Austrian Academy of Sciences. Each institution focuses on different project parts of the SFB. Overall, the aim of the project is to investigate the entire spectrum of variation and varieties of German in Austria, bringing together expertise from the fields of variationist linguistics, contact linguistics and multilingualism research, as well as from socio-linguistically based research on language perception and attitudes.

I joined the project in 2020, and am part of the Viennese project teams at the University in Vienna, where we look at different linguistic system levels and the different varieties along the dialect standard continuum of the German language in Austria. In our analyses, we consider both horizontal, or geographical, variation, as well as vertical variation, which includes social aspects such as age or gender. I work on lexical variation, which is concerned with the differences in meaning and use of words in a person’s vocabulary.

Why is it important to study lexical variation?

Research into lexical variation has a long tradition, particularly in the field of dialectology in German. Until now, the research focus has mainly been on the dialectal spectrum and this is why, in the project, we're taking a broader look at different registers along the dialect and standard axis. Many of the existing analyses have typically been concerned with the geographic distribution and variation of dialect features, and have somewhat neglected the vertical, or social, dimensions within the entire repertoires of speakers. In the project, we're bringing these together to offer a more comprehensive picture.

In addition to the work on lexis, my colleagues are working on other language system levels, such as phonetics/phonology, morphology, syntax and pragmatics. So, our analyses are also located at the interface of these levels.

What is the research methodology?

By means of triangulation and a combination of approaches, both established and innovative methods of qualitative and quantitative data collection, handling, analysis and presentation are used. As far as the survey of lexical data is concerned, we combine both digital and analogue methods, for instance in face-to-face survey settings on-site (including oral conversations among friends, interviews or special lexical experiments) and also by means of ‘indirect’ surveys via questionnaires. In order to also analyse lexical variation within the written standard language in Austria we draw on corpora such as the Austrian Media Corpus (amc). The amc data is complementary to the questionnaires and to the spoken data we have in the project. It helps to corroborate findings with larger amounts of data.

The spoken language data is stored in a database, where it is transcribed and annotated with the specific features for each of the language system levels, so that we can investigate lexical variation, but also phonetic features, syntactic or pragmatic phenomena or a combination of these.

You are using the Austrian Media Corpus as one of your main data sources. Why?

The amc is unique in that it is, as far as we know, the only digital resource that covers almost the entire print media landscape of Austria over more than two decades. Its content is rich and varied and covers a relatively large timeframe, from 1986 up to 2022. It also counts among the largest collections of digital German journalistic prose data at the moment, that is, it has representative texts for written standard German. That’s why we draw on the amc data as a complimentary data source in addition to the spoken language data collected in the project. Additionally, the material is digitally well prepared: it has different tag sets, dependency parsers, for example, and named entity recognisers. The corpus can be consulted for academic linguistic research through an interface that is hosted by the ACDH-CH, a CLARIN-B-centre and a partner in the at CLARIN-AT consortium.

The project is still ongoing, but what are some preliminary results regarding lexical variation of German in Austria?

In terms of lexical variation, we can see that variation is present on the geographical dimension, on the one hand, but also vertically, or socially. For instance, we might find differences between younger and older speakers, but also between registers, that is, between dialect, colloquial language and standard German. Often, the level of variation may also depend on the specific lexical phenomenon we look at.

*Variation of the word 'potato' on the geographical dimension.*

One exemplary type of phenomena we have investigated so far concerns so-called ‘Austrianisms’. These would be – in very simplified and general terms – variants or features that occur typically, or with a very high frequency in Austria. They could be grammatical, phonetic, or lexical. An example of a lexical feature would be using different words for the same concept: the word for tomato, for instance, could either be the German word Tomate, or Paradeiser which is often seen as an Austrianism (even if it is not distributed across the entire Austrian language area). In public discourse, it is often the famous food names from the so-called Accession Protocol No. 10 that are discussed in the context of Austrianisms, e.g. Erdäpfel for potato, Karfiol for cauliflower, or Marillen for apricots.

We have looked at these in the amc corpus as one data part, and also in online questionnaires. In the amc, we investigated a certain timespan (from 2001 to 2020) and defined areas in the corpus for more than 100 of such items, looking at if and how they vary.

*Use of the words 'Erdapfel' and 'Kartoffel' based on amc data.*

Overall, we can say that the Austrianisms, also in the amc corpus, show a relatively high degree of stability over the past 20 years. This was unexpected, because in the past and in the literature, they have often been linked to threat scenarios – the fear that these Austrian peculiarities are being either replaced or diminished by German-German variants, for example.

Is there a correlation between the use of Austrianisms and age or other social factors?

Our data has shown that it is difficult to generalise results in this respect, and that it tends rather to depend on the individual lexical phenomenon that is being investigated. While Austrianisms in the amc data have been used fairly stably in the press releases of the past 20 years, spoken data from individuals in Austria can show a somewhat different picture. The comparison of generations in particular indicates that many Austrianisms are still being used more frequently by older than by younger people. So, for example, with Tomate and Paradeiser, our questionnaire data show that both variants are used in Austria across various areas and their dialects, colloquial and standard registers, however, to varying degrees. Overall, Tomate is used to a higher degree in standard registers by both older and younger speakers, and by younger speakers also in non-standard registers. The Austrianism Paradeiser, however, occurs with a relatively higher frequency with older than with younger speakers across dialect, colloquial and standard registers. We are currently collecting further data in our latest questionnaire ‘Wort-Schätze’, which can be accessed here.

The project is continuing for another three years. What will be the project’s output?

Firstly, the SFB produces comprehensive and detailed analyses of language/s in Austria – including manifold forms of contact between varieties and languages in Austria – as a result of the close collaboration between members of the research team, who are specialised in diverse linguistic subdisciplines, i.e. variationist linguistics, socio-linguistics, dialectology, historical linguistics, research on language contact, language acquisition, multilingualism and German as L2, research on language attitudes and perception, corpus linguistics, computational linguistics and language technology. The analyses are disseminated in various formats (monographs, anthologies, journal articles) in internationally renowned and peer-reviewed publication outlets (book series and journals).

The second outcome of the SFB is that the data collected and processed are made available via an online digital research infrastructure on German in Austria. Via the SFB research platform, the SFB data (which are of course handled according to strict ethical standards) will be systematised, processed for the use of search tools and published online. The data will be made accessible to linguists, language learners, language teachers and the general public.

What has been your experience working with CLARIN resources?

I have used CLARIN’s resources before in my own Digital Humanities projects, and I am also familiar with the infrastructure, which I find very helpful for researchers who work with corpora. In addition, I incorporate CLARIN into my teaching in the context of Digital Humanities lectures and I introduce students to the infrastructure. The Virtual Language Observatory is particularly popular.

Another important aspect of using CLARIN is the soft infrastructure. I find the ACDH Help Desk very useful. In my previous projects, for example, I have been able to draw on the experience of colleagues working at the CLARIN B-centre in ARCHE (A Resource Centre for the HumanitiEs). The help desk in this area functions like one of the local K-centres. So, you can send a request, and they will help you to find the information.

What are your plans for the future?

I will continue my research on language variation and Digital Humanities at the University of Vienna, and also at the Austrian Centre for Digital Humanities and Cultural Heritage, working with language data, corpora and with different digital tools and methods. In our current research we are just seeing the tip of the iceberg and there is a lot more interesting research to be done! I am also looking forward to using different CLARIN resources, including in teaching contexts.

An introduction to the project ‘German in Austria. Variation – Contact – Perception’ by Professor Alexandra N. Lenz can be found here.