Reflections on CLARIN workshop on Oral History in Arezzo

Submitted by karolina@clarin.eu on 23 May 2017

As a follow up of the CLARIN workshops on Oral History (OH) archives in Oxford and Utrecht in 2016, the Arezzo workshop (10-12 May 2017) was meant for the finalization of the setup of a transcription chain for OH-interviews. For more information visit the workshop website.

Stefania Scagliola published a blog post 'Catching Speech in Arezzo: A Clarin workshop for developing a transcription-chain for Oral History’ on the blog site of the Luxembourg Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg. You can read it here. Another participant, Bianca Pastori has also written a blog post, which you can read below.

The envisaged transcription chain for OH-interviews, conceived as a set of web-based services turning recorded speech into a textual representation that is as close as possible to what has been uttered, can be visualized as follows:

Blog post written by Bianca Pastori

The recent CLARIN workshop took place in Arezzo on 10-12 May 2017 at the Department of Education, Human Sciences and Intercultural Communication – Siena University, Campus ‘Il Pionta’. The topic of the workshop was Oral History Transcription Chain.

As a member of the Italian Oral History Association (AISO) I was invited by CLARIN to discuss technologies that could help scholars and researchers managing oral sources, which I found attractive and compelling opportunity.

The participants were invited to test the tools used in Oral History Transcription Chain dealing with automatic speech recognition, transcription, alignment of audio and transcriptions, adding metadata, etc.

Discovering some of these tools and their potential was certainly a highlight of the workshop. Furthermore, I appreciated the collaborative approach of the seminar: “What are your needs? Which kind of technologies can improve your work?” were some of the questions asked by the technology experts and then explored with participants.

Piero Cosi, from the Institute of Cognitive Sciences and Technologies, presented the Italian Alligner, which I had the opportunity to test with my audio files and transcriptions. What I understood was that in order to perfect automatic speech recognition for Italian, the whole system has to be trained with a large number of good quality audio files and associated transcriptions. This necessity invites scholars, researchers and institutions dealing with different kinds of oral sources to collaborate and share their works in order to develop tools and standards that can support sharing and accessing data.

The presentation given by Arjan Van Hessen on “Proposal for Transcription Chain” and the discussion that followed focused on the desired situation of having a portal containing all these services in one place. This would allow scholars from different disciplines to manage and browse collections of different data. At the same time the participants stressed the urgency to protect the research material and its contributors from the perspective of ethical matters and sensitive data.

If I think about the significant intangible heritage stored in numerous archives in Italy and elsewhere, the purposes and the final proposals of the workshop seem to be of great interest and beneficial to save and preserve sources as well as to disseminate them and to contribute to knowledge sharing.

Photographs from the workshop taken by Leon Wessels.

For the full list of participants click here.