Tour de CLARIN: Interview with Peter Andorfer, Stephan Kurz, and Martin Anton Müller

Submitted by Jakob Lenardič on 24 May 2021

The following Tour de CLARIN interview is about ARCHE, the Austrian B-centre that is run by the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH). The interview features Peter Andorfer, who is a research software engineer at ACDH; Stephan Kurz, who is a German studies scholar working with digital scholarly editing; and Martin Anton Müller, who is a German philologist.

L–R: Peter Andorfer, Martin Anton Müller, Stephan Kurz

1. Please introduce yourself (your academic background and current position).

Peter Andorfer: I am a historian by training, and since 2015 I’m working as a research software engineer at the ACDH-CH – with a focus on developing data-driven applications, data wrangling and archiving.

Stephan Kurz: A German studies scholar by training, I developed a genuine interest in digital scholarly editing, also in connection to making available less well known epistolary novels in the context of my dissertation. At the Austrian Academy of Sciences, I work with digital and hybrid scholarly editions – from archival sources to the long-term preservation of digital representations. The Institute for Habsburg and Balkan Studies currently has two ongoing major digital edition projects that I am involved in: The Minutes of Ministers’ Councils of Austria and the Austro-Hungarian Monarchy, and an upcoming edition of sources of Ottoman–Habsburg diplomatic exchanges.

Martin Anton Müller: I am currently leading a research project on the online edition of the Austrian dramatist and author Arthur Schnitzler’s (1862–1931) professional correspondence. I studied German philology and art education and then did my doctorate in philosophy. In the course of working on various scholarly edition projects over the last thirteen years – the latest two as project leader – the digital approach has become increasingly important.

2. How are you involved with ARCHE?

Andorfer: I’m part of the ARCHE core team, which means most of the time I request new features, play around with those new features, find bugs and/or more things I’d like to have. On the other hand, I’m also a user of ARCHE, someone who wants to see data archived in some safe haven for eternity.

Kurz: Already now, ARCHE is an important infrastructural part of the Austrian Digital Humanities field. Most projects I work with will eventually end up in ARCHE. This is also because of the projected sustainability of this solution in comparison with other long term data storage solutions. I am simply an end-user of this system who deposits data there. Together with Karin Schneider, we deposited a mid-size scholarly edition of protocols, treaties and other genres related to the follow-up negotiations after the Vienna Congress, entitled Mächtekongresse 1818–1822, which you can find on ARCHE using a practical Handle link. Of course, I am always curious to also discover new additions to the repository – there’s lots to discover!

Müller: My self-image and my ideas of what I have to do as an editor are shaped by book production. When I make a book, I can assume with some certainty that in the year 2500 it will still be possible to read it. This is crucially different in the digital realm. The integration of my research data in ARCHE is the first time that I am trying to hand over data in such a way that it can be re-used for as long as possible and as well as possible, so that in the best case scenario it will still be accessible in 2500 as well.

3. Which ARCHE resource and/or tools have you used? How, concretely, did you integrate them in your existing research?

Andorfer: I mainly use the ARCHE- (https://app.swaggerhub.com/apis/zozlak/arche) and ARCHE Dissemination Services. A core feature of ARCHE is its granularity. Whereas solutions like e.g. Zenodo – which I like and use myself a lot – provide stable and resolvable IDs mainly on the collection level, ARCHE exposes each of its resources, be it collections, binaries, entities such as persons or organisations, through resolvable URIs. This makes it possible to integrate ARCHE resources into other applications or to develop your own clients that interact with ARCHE. To give an example, a previous version of ARCHE used a triple store as a storage layer, which made it possible to query ARCHE through SPARQL. But for recent ARCHE versions, we switched to a PostgreSQL database, mainly for performance and maintainability reasons. By doing so we lost the SPARQL feature. But thanks to the existing ARCHE API, with its default response format being RDF, it is quite easy to write an API client in your favourite programming language that would fetch the data and throw it into a triple store.

Kurz: ARCHE to me is only part of the digital infrastructure that keeps evolving around our digital editions workflows – by nature and by design, it’s concerned with the later stages in the life cycle of digital objects. For data generation and data curation, we use various other tools. But the necessity to be able to archive data in ARCHE in a meaningful way with metadata that goes beyond the coarse surface level that is normally collected with bibliographic records does in fact help conceptually already at a data modelling step that precedes all data collection. ARCHE’s web interface exactly shows what metadata fields you have to fill out on the levels of project, collection and resource. In addition, there are several well-documented API endpoints that return metadata in various formats. It’s a satisfying experience to have been tediously generating metadata in one format and having ARCHE return it through its interface to another web application in another format. Furthermore, the semantic integration of statements (e.g. A said that person B has this other identifier C) makes a lot of sense to me, especially when those statements become identifiable as resources that have their own handle.net handles. With potentially infinite versions of a single data set, dealing with data resources otherwise quickly becomes more complex than a human can deal with. That said, ARCHE helps a lot, both before and after creating a digital resource. Yet the most important factor is the ARCHE team, experts who truly thought through what it means to long-term host digital objects. They truly are there to help, to cross-check, and to integrate digital resources.

Müller: I have only been working with primarily digital data for just under three years. For their display, I have a web app from Peter, the DSE-base app, which simplifies it a lot to make the data – in my specific project, letters, telegrams and postcards – accessible. And Peter with the curation of Schnitzler’s diary has implemented a related use-case that I can track and copy. Frankly, ARCHE to me is more like one of several end goals next to the website and preparations for a possible print edition. I’m learning all the time, but at the moment I know how the data gets into ARCHE, but I’m not yet sure what I'm going to do with it yet.

4. Are there any specific features of ARCHE tools and resources that make them especially well-suited for Digital Humanities Research, especially for non-technical users?

Andorfer: In my opinion, digital humanities research needs some technical understanding or some tiny bit of data literacy. The beauty of ARCHE is its consequent usage of the RDF data model. All data ingested or requested is described in RDF. So in the end I don’t need any fancy graphical user interface to read or process ARCHE’s (meta)data; I just need to know RDF or at least how to read RDF – a well established W3C standard.

Kurz: I particularly like that ARCHE forces you to think about your metadata schema in a way that makes metadata comparable across the resources that are stored. So my take on this is that the scholarly communities of various disciplines really need to think about the nature and state of their digital resources. This is a tedious process, but it helps to clear up lots of questions that researchers may eventually be asked about the provenance, creation, legal situation, metadata curation, etc. of their data. Kudos to the colleagues at ACDH-CH who have enabled us to rethink our metadata! The need for sufficient and in-depth metadata not only helps users discover and explore one’s data, but it already reshapes how data creators and curators think about what they do. This may be a long process, but ultimately it’s also about transparency in the research process itself.

Müller: What is relevant for me right now at the end of the project is the question of what happens after me. Two things are relevant. One is that ARCHE offers clear guidelines for citing resources that include authorial information as well as a persistent identifier for the resource that lasts for decades. The second is that the research results can be downloaded and further developed.

Kurz: Yes, the ARCHE dissemination services that Martin just mentioned are another key feature. This enables data depositors to create custom views of their data on the fly, which works without having a dedicated web application that they have to maintain separately, and allowing access through stable and persistent identifiers.

5. What makes ARCHE especially important for the Austrian Digital Humanities and SSH research community?

Andorfer: It is a technical solid data repository hosted by a hopefully quite stable institution. So there is a possibility that ARCHE will not cease to exist soon and therefore can be used by the Austrian SSH research community and everyone else who wants to publish Digital Humanities research data. The ACDH-CH, as its name and mission suggest, is rather focussed only on Digital Humanities, so ARCHE has in a sense a wider scope in relation to data.

Kurz: One important thing to note is that, yes, ARCHE is an integral part of the DH community, but this is only true in conjunction with other tools and services provided by the ACDH-CH. For me, coming from another institute that is also part of the Austrian Academy of Sciences, it is a luxurious thing to have a CLARIN B Centre so close by, with all that comes with it. I suppose the ongoing digital transition of the Social Sciences and Humanities will lead to an ever increasing attention toward digital infrastructure in general, and ARCHE in particular. I can only underline what Peter said about the beautiful simplicity of the RDF data model – using such a low-level “grammar” makes it well-prepared for future challenges. ARCHE already stores a wide variety of different data formats, some of them relating to textual sources such as the collections that Martin mentioned, but others referring to 3d geometries, images etc.

Müller: I think ARCHE’s solution to have generic interfaces for multimodal data is pretty smart.

6. Do you integrate ARCHE tools and resources into your university teaching? If so, how exactly do you integrate them?

Andorfer: The linguistic resources stored in ARCHE are especially regularly used for university teaching, for example in the Department of Near Eastern Studies or the Department of German Studies. Resources include the travel!digital Collection (A digital collection of early German travel guides on non-European countries which were released by the Baedeker publishing house between 1875 and 1914), the Facsimiles of Arthur Schnitzler's Diaries (1879-1931), the VICAV - Vienna Corpus of Arabic Varieties or the amc: Austrian Media Corpus . For many of the archived datasets, dedicated web applications developed and hosted by the ACDH-CH exist. For instance, the travel!digital Corpus allows researchers to explore the thesaurus and view the facsimiles of the travel guides in combination with their transcribed texts, Schnitzler Tagebuch provides a bespoke view on the facsimiles and the persons, places and dates mentioned in Arthur Schnitzler’s diaries and VICAV presents further materials for near eastern languages research. Other use cases of data stored in ARCHE include the virtual Hackathon from 2018. Most of the data in ARCHE is public and open to be used in teaching everywhere around the world, even though we can’t tell where… By the way, DH courses can be registered in the Digital Humanities Course Registry (another CLARIN resource ACDH-CH is providing)!

Kurz: During the COVID pandemic, the necessity of sustainable data and metadata hosting was becoming clear as all those web services and platforms for e-learning were discussed … Luckily, I did not have any teaching duties (laughs). In all seriousness: I think the takeaway message for DH teachers, and also for people who design DH curricula, may be to include digital preservation early on – and to involve students in the actual data life cycles in a way that they really get a feeling how important, but also how tedious, it is to produce meaningful documentation and metadata. From my own experience in making DH a desirable subject area, I see a focus on data creation and data curation, fancy web application development and the like, but the “boring” things that ultimately keep past projects accessible and reusable are often overlooked.

Müller: I did not have any teaching duties either.

7. What’s your vision for ARCHE 10 years from now?

Andorfer: All ARCHE resources can still be requested via their URIs and that their metadata can be read and interpreted by machines as well as human beings. This may not sound like a big vision but after all, that's what a long term data repository is built for, storing data in a persistent, findable and resolvable manner.

Kurz: I’m sure that ARCHE will be alive and kicking in one way or another – that’s why it has been created. Web applications may come and go, but data repositories are intended and designed to be around for a longer term. Its backend may change in the future, maybe even to a successor system that could deal faster with the vastly larger amounts of data that I’d expect to be stored in this infrastructure; its frontend and API may change and improve, ARCHE may be adapted to new file and metadata formats – but exactly such processes of regularly updating and curating are core functions of such a system – apart from versioning and redundant data storage under transparent circumstances and policies.

Müller: I always find it difficult to predict the future because I always fall back on analogies. In this case, I hope that the handling of research data in ARCHE will become easier in the same way that the internet went from being something for nerds to a mass medium. It is not so important whether ARCHE develops to something simpler or whether more and more users acquire the necessary skills. I imagine both will take place.