Goals and Objectives
The goal of the course in sociolinguistics is to show students the possibilities and challenges offered by oral history archives for (socio)linguistic research. The course is intended as a research framework that will guide students during their future research work. The lectures allow students to become acquainted with the CLARIN infrastructure, and to present them with software tools that will allow them to carry out their own thesis research independently. The course offers guidance for the following steps that must be addressed during a (research) project dealing with oral archives: i) reviewing ethical and legal issues arising from using and reusing legacy data; ii) use of metadata to provide the appropriate level of description for the dataset; iii) automatic and manual transcription of the speech material, using the CLARIN infrastructure; iv) the selection and use of the appropriate CLARIN software and tools depending on the research goals (phonetic, lexical, discourse analysis, etc.).
Dipartimento di scienze della formazione, scienze umane e della comunicazione interculturale
Languages for Intercultural and Business Communication, Università di Siena
Description of the Training Materials
|(Sub)discipline, topic, language(s)||
Topic: Sociolinguistic Research and Legacy Data: From Legal and Ethical Issues to Data Analysis
Language: Italian (both standard and regional varieties)
|Keywords||Legacy data, oral archives, transcription, legal and ethical issues|
1. Orthographic transcripts with Octra
5. MAUS segmentation
The tools can be accessed via the BAS web service interface.
|Structure and duration||
The learning module is structured as an online course comprising a number of lectures and exercises with tutorials (small work groups, hands-on and guided debates).
The total duration of the course is 36 hours. The first section of the course (10 hours) introduces students to the basic notions of sociolinguistic analysis (variation across time and space, gender, age, style, social network, community of practice, etc.). As the first section of the course is not relevant to the CLARIN call, it will not be discussed further in this document.
The case study of the course is represented by Anna Maria Bruzzone’s archive, now at the University of Siena. Bruzzone’s archive is an appropriate case study for addressing most of the relevant issues of conducting research with oral history archives. Some of the problems posed by the archive have been already presented to the CLARIN community:
The community itself has helped in disentangling relevant issues, as legal issues involved in the reuse and ethical/legal issues involved in providing metadata.
The course is structured as follows (the names of the corresponding .pdf files are shown in green):
1. Introduction to Sociolinguistics (10h): the first lectures deal with basic sociolinguistic concepts.
2. Oral Archives (2h). The lecture introduces students to oral archives. In the first lesson, we introduce the basic notion of oral archives.
3. Anna Maria Bruzzone’s Archive (4 h). The lecture is devoted to the description of Anna Maria Bruzzone’s archive.
4. Ethical and Legal Issues in the Realm of Oral Archives (2 h). The lecture covers current GDPR and its implementation in Italian law, with a focus on voice as biometric data. Attention is also devoted to responsibility and accountability.
5. Oral vs. Written Data (2 h). The lecture addresses issues regarding the difference between spoken and written language. The transcripts of oral recordings represent a crucial source of data for the study of linguistic varieties, but is it possible to transcribe oral speech material (popular oral literature, autobiography, beliefs and superstitions, folktales, jokes, legends, rhymes and riddles, etc.) while keeping fidelity to the oral source? The scribal convention will be presented and discussed. Although seemingly trivial matters, these conventions are in fact matters of linguistic interpretation that can significantly affect both the results of linguistic analysis and also our general impression of how different a variety is from other varieties.
6. Transcribing Oral Data (2 h). The lecture introduces theoretical and practical issues encounter during transcription. What is transcription? What changes have occurred in the way interviews have been transcribed over the years and in different academic fields (anthropology, oral history, dialectology etc.).
7. The Transcription Chain (2 h). The lecture is about practical guidance for transcription. Students are introduced to the transcription chain using CLARIN tools. The lecture focuses on possibilities offered by ASR and online open-source software as OCTRA. In particular, students become familiar with the BAS web service interface.
8. Tutorial with Exercises (2 h). Hands-on practice with the oral transcription chain using CLARIN tools. Students are provided with interviews from the two Bruzzone corpora and they are asked to transcript the data, from chunking the audio file to the actual verbatim transcription. Students use different pipelines and perform transcription using ASR and manual correction. In the days after, students usually work in pairs, transcribing the interviews. They reach a detailed knowledge of oral histories and they are able to discuss the interview dynamics in greater detail.
9. The ninth lecture is a second tutorial with exercises (2 h). In this tutorial, students are introduced to the CLARIN portal.Students are invited to explore the available corpora and to see different levels of transcription. 10. In the last lectures (8 h) the students are invited to share the results in the form of a brief resume, showing all the steps they have done for transcription and listing all the corpora that they are judged similar for their research.
Undergraduate students 12 CFU in Linguistics (L-LIN/01).
Basic knowledge of the International Phonetic Alphabet (IPA)
University of Siena mail account
Learning module in sociolinguistics with lectures, practical work at home and final reports from students.
|Course(s) in which the training material was used||
The material has already been used in the following courses:
|Licence and (re)use||Data from Bruzzone’s archive can be reuse only upon permission by Siena University Some of the slides can be used upon permission by the authors|
|Creation date||Summer 2020|
|Last modification date||February 2021|
Experience with Using CLARIN Resources in Teaching
The possibility of using CLARIN tools when offering an online sociolinguistic course has permitted to bring students closer to the real workflow of a sociolinguist. In particular, CLARIN tools can be used online, and this permits lecturers to follow the students during their exercises. Students can, in fact, try to annotate with OCTRA, or performing chunking, in real-time, and they can share the result during the lesson. Additionally, CLARIN tools are provided with several FAQs that help students in addressing problems. However, many students at the beginning of their career can encounter difficulties because most CLARIN tutorials are offered in English. For this reason, we have adapted and translated parts of the tutorial presented at this CLARIN event: Christoph Draxler, Florian Schiel (Ludwig-Maximilians-Universität München IPS). Creating, Managing and Analysing Speech Databases using BAS Services and Emu: a Hands-On Tutorial.
This course has enriched my knowledge and improved my learning ability in history and cultural studies. I was given the possibility to listen to several recordings and oral histories. Thanks to these documents dense in customs and traditions at risk of disappearing, I have acquired considerable knowledge in this area. (Anonymous student)
I immediately felt involved and enthusiastic about the course, as I was given the responsibility of transcribing and preserving the voices of those who left their life testimony about imprisonment in psychiatric hospitals. During the course, I learned a lot, not only from a historical perspective. In fact, the former neuro-psychiatric hospital in Arezzo was among the first to make changes that would then lead to the closing down of mental hospitals in Italy. I also learned a lot from a personal point of view, as there is nothing more precious than listening to the voice of those who find hope again and can return to feel like "people" after having been classified as "madmen" for so long. (Anonymous student)
Additional Information and Resources
In the years 2017-2021, the discovering and the re-use of Bruzzone’s oral archive by our research group received extensive coverage in the Italian press, both at the national and at the regional level:
- Giornata della memoria
- Bright Night – La notte dei ricercatori
- Italian newspaper La Repubblica
- TGR RAI TOSCANA 26 settembre 2017, edizione delle 19,30, servizio di Adele De Francisci
- TELETRURIA 26 settembre, servizio di Luigi Alberti
- TSDTV 26 settembre, servizio di Riccardo Ciccarelli
- Repubblica.it (anche nell'edizione fiorentina del quotidiano)
- Venerdì di Repubblica 17.11.2017
- Corriere della Sera – La Lettura 19.11.2017
- Further links and documents can be accessed on Google Drive.