Oral Archives for Sociolinguistic Research

Goals and Objectives  

The goal of the course in sociolinguistics is to show students the possibilities and challenges offered by oral history archives for (socio)linguistic research. The course is intended as a research framework that will guide students during their future research work. The lectures allow students to become acquainted with the CLARIN infrastructure, and to present them with software tools that will allow them to carry out their own thesis research independently. The course offers guidance for the following steps that must be addressed during a (research) project dealing with oral archives: i) reviewing ethical and legal issues arising from using and reusing legacy data; ii) use of metadata to provide the appropriate level of description for the dataset; iii) automatic and manual transcription of the speech material, using the CLARIN infrastructure; iv) the selection and use of the appropriate CLARIN software and tools depending on the research goals (phonetic, lexical, discourse analysis, etc.).


Dipartimento di scienze della formazione, scienze umane e della comunicazione interculturale
Languages for Intercultural and Business Communication, Università di Siena
Arezzo, Italy

Description of the Training Materials

(Sub)discipline, topic, language(s)
Subdiscipline: Sociolinguistics
Topic: Sociolinguistic Research and Legacy Data: From Legal and Ethical Issues to Data Analysis
Language: Italian (both standard and regional varieties)
Keywords Legacy data, oral archives, transcription, legal and ethical issues
CLARIN resources
1. Orthographic transcripts with Octra
2. ASR
3. Chunker
4. G2P
5. MAUS segmentation
6. Pho2Syl
The tools can be accessed via the BAS web service interface.
Structure and duration
The learning module is structured as an online course comprising a number of lectures and exercises with tutorials (small work groups, hands-on and guided debates).
The total duration of the course is 36 hours. The first section of the course (10 hours) introduces students to the basic notions of sociolinguistic analysis (variation across time and space, gender, age, style, social network, community of practice, etc.). As the first section of the course is not relevant to the CLARIN call, it will not be discussed further in this document.
The case study of the course is represented by Anna Maria Bruzzone’s archive, now at the University of Siena. Bruzzone’s archive is an appropriate case study for addressing most of the relevant issues of conducting research with oral history archives. Some of the problems posed by the archive have been already presented to the CLARIN community:
The community itself has helped in disentangling relevant issues, as legal issues involved in the reuse and ethical/legal issues involved in providing metadata.
The course is structured as follows (the names of the corresponding .pdf files are shown in green):
1. Introduction to Sociolinguistics (10h): the first lectures deal with basic sociolinguistic concepts.
2. Oral Archives (2h). The lecture introduces students to oral archives. In the first lesson, we introduce the basic notion of oral archives.
  • What is an oral archive?
  • What is the difference between oral and material archives?
  • What types of oral archives can be used in linguistic analysis?
  • What are the typical problems encountered when dealing with oral archives (carriers decay, digitalisation, obsolescence, relationship between the document and the intellectual unit, etc.)?
3. Anna Maria Bruzzone’s Archive (4 h). The lecture is devoted to the description of Anna Maria Bruzzone’s archive.
  • The first part of the lecture addresses the description of the Ci Chiamavano matti collection, which consists of interviews that Bruzzone conducted with the patients of the neuro-psychiatric hospital (Arezzo) in the summer of 1977. Some space is also devoted to the description of Gorizia corpus (interviews collected in 1968 without recording device in the Gorizia Psychiatric hospital), published for the first time in 2021 (A.M. Bruzzone, Ci chiamavano matti. Voci dal manicomio (1968-1977), edited by M. Setaro, Silvia Calamai, Milano, Il Saggiatore 2021).
  • The second part of the lecture focuses on Le donne di Ravensbrück collection. It consists of about 19 hours of recording with five survivors of Ravensbrück. The interviews formed the basis for the book that Bruzzone first published in 1978 (L. Beccaria Rolfi, A.M. Bruzzone, Le donne di Ravensbrück. Testimonianze di deportate politiche italiane, Torino, Einaudi 1978 and 2020). During this lesson, possible lines of linguistic research are also presented (for example, pause duration analysis, specific phonetic phenomena, the relationship between vernacular and standard Italian etc.) 
4. Ethical and Legal Issues in the Realm of Oral Archives (2 h). The lecture covers current GDPR and its implementation in Italian law, with a focus on voice as biometric data. Attention is also devoted to responsibility and accountability.
  • What are the problems in managing oral archives?
  • How can we provide FAIR (findable, accessible, interoperable and reusable) data when managing historical records?
  • What is the minimum level of description that we can provide for oral history data?
  • Special attention is devoted to the presentation of the experience of Italian 'Vademecum per il trattamento delle fonti orali'.
5. Oral vs. Written Data (2 h). The lecture addresses issues regarding the difference between spoken and written language. The transcripts of oral recordings represent a crucial source of data for the study of linguistic varieties, but is it possible to transcribe oral speech material (popular oral literature, autobiography, beliefs and superstitions, folktales, jokes, legends, rhymes and riddles, etc.) while keeping fidelity to the oral source? The scribal convention will be presented and discussed. Although seemingly trivial matters, these conventions are in fact matters of linguistic interpretation that can significantly affect both the results of linguistic analysis and also our general impression of how different a variety is from other varieties.
6. Transcribing Oral Data (2 h). The lecture introduces theoretical and practical issues encounter during transcription. What is transcription? What changes have occurred in the way interviews have been transcribed over the years and in different academic fields (anthropology, oral history, dialectology etc.).
7. The Transcription Chain (2 h). The lecture is about practical guidance for transcription. Students are introduced to the transcription chain using CLARIN tools. The lecture focuses on possibilities offered by ASR and online open-source software as OCTRA. In particular, students become familiar with the BAS web service interface.
8. Tutorial with Exercises (2 h). Hands-on practice with the oral transcription chain using CLARIN tools. Students are provided with interviews from the two Bruzzone corpora and they are asked to transcript the data, from chunking the audio file to the actual verbatim transcription. Students use different pipelines and perform transcription using ASR and manual correction. In the days after, students usually work in pairs, transcribing the interviews. They reach a detailed knowledge of oral histories and they are able to discuss the interview dynamics in greater detail.
9. The ninth lecture is a second tutorial with exercises (2 h). In this tutorial, students are introduced to the CLARIN portal.Students are invited to explore the available corpora and to see different levels of transcription. 10. In the last lectures (8 h) the students are invited to share the results in the form of a brief resume, showing all the steps they have done for transcription and listing all the corpora that they are judged similar for their research.
Target audience
Undergraduate students 12 CFU in Linguistics (L-LIN/01).
Basic knowledge of the International Phonetic Alphabet (IPA)
Facilities required
Personal computer
University of Siena mail account

Learning module in sociolinguistics with lectures, practical work at home and final reports from students.

Course(s) in which the training material was used
The material has already been used in the following courses:
Licence and (re)use Data from Bruzzone’s archive can be reuse only upon permission by Siena University Some of the slides can be used upon permission by the authors
Creation date Summer 2020
Last modification date February 2021

Experience with Using CLARIN Resources in Teaching 

The possibility of using CLARIN tools when offering an online sociolinguistic course has permitted to bring students closer to the real workflow of a sociolinguist. In particular, CLARIN tools can be used online, and this permits lecturers to follow the students during their exercises. Students can, in fact, try to annotate with OCTRA, or performing chunking, in real-time, and they can share the result during the lesson. Additionally, CLARIN tools are provided with several FAQs that help students in addressing problems. However, many students at the beginning of their career can encounter difficulties because most CLARIN tutorials are offered in English. For this reason, we have adapted and translated parts of the tutorial presented at this CLARIN event: Christoph Draxler, Florian Schiel (Ludwig-Maximilians-Universität München IPS). Creating, Managing and Analysing Speech Databases using BAS Services and Emu: a Hands-On Tutorial.

Students' Testimonials

This course has enriched my knowledge and improved my learning ability in history and cultural studies. I was given the possibility to listen to several recordings and oral histories. Thanks to these documents dense in customs and traditions at risk of disappearing, I have acquired considerable knowledge in this area. (Anonymous student)
I immediately felt involved and enthusiastic about the course, as I was given the responsibility of transcribing and preserving the voices of those who left their life testimony about imprisonment in psychiatric hospitals. During the course, I learned a lot, not only from a historical perspective. In fact, the former neuro-psychiatric hospital in Arezzo was among the first to make changes that would then lead to the closing down of mental hospitals in Italy. I also learned a lot from a personal point of view, as there is nothing more precious than listening to the voice of those who find hope again and can return to feel like "people" after having been classified as "madmen" for so long. (Anonymous student)

Download Information

Zenodo:  10.5281/zenodo.5226953

Additional Information and Resources

In the years 2017-2021, the discovering and the re-use of Bruzzone’s oral archive by our research group received extensive coverage in the Italian press, both at the national and at the regional level:

Cite this Work

Silvia Calamai, & Rosalba Nodari. (2021, August 20). Oral archives for sociolinguistic research. Zenodo. https://doi.org/10.5281/zenodo.5226954

Contact Information

Teachers who reuse and adapt this training material are invited to share their feedback via training@clarin.eu