1. Could you describe your academic background, your research interests, and your current position? What inspired you to focus your research on speech prosody and the creation of corpora for speech technologies?
Katarzyna Klessa (KK): My academic background is primarily in experimental phonetics and applied linguistics. I work as a university professor at the Department of Multimodal Communication, Institute of Applied Linguistics at Adam Mickiewicz University in Poznań, Poland. My research interests originate in my fascination with two areas: human communication and technology. My formal education was technical only in secondary school, where I graduated from a technical automation class. Afterwards, I continued with a more humanities-oriented university education at the Faculty of Modern Languages and Literatures here in Poznań.
In my research, I am interested in the amazing phenomena that occur during human communication when people produce speech, hear and perceive sound, and interact with one another. I am excited about technology because it is something that gets us closer to inspecting the nature of these phenomena, by allowing us to precisely measure the properties of the sounds that we generate and decode. Importantly, communication also goes beyond the sounds themselves since we communicate with the way we shape the form of words and use melody. We also use certain special sounds such as hesitation markers as well as gestures or mimicry.
I have been involved in a number of different projects that focused on designing, creating, annotating and exploring speech corpora. At some point, my colleagues and I also decided to develop our own tool called Annotation Pro to support such tasks. Because speech corpora are used to verify assumptions about spoken language and are also fundamental for the development of speech synthesizers or various speech or speaker recognition applications, they are highly appreciated in disordered speech analysis, speech therapy and training. Corpus-based technologies may be helpful in both research-educational and clinical practice. These disciplines have been the common ground for Anita and me since the times of our PhD studies when we first met at the Department of Psycholinguistics in Poznań.
Anita Lorenc (AL): Yes, we met in Poznań, where I wrote my PhD thesis under the supervision of prof. Piotra Łobacz. The thesis was about the acoustic analysis of the speech of hearing-impaired children (e.g. VOT parameter). Currently, I work at Warsaw University, at the Institute of Applied Polish Studies as a phonetician and as a Polish philologist and speech, language and hearing pathologist. I am the head of Maria Przybysz Piwko Laboratory of Applied Phonetics at Warsaw University.
As the head of a research project funded by the Polish National Science Center, I initiated a study of the contemporary Polish pronunciation based on electromagnetic articulography, which is the very first wide-scale research of this type in Poland. As an important outcome of the project, we established an interdisciplinary research-development team that consists of Łukasz Mik and Daniel Król from the University of Applied Sciences in Tarnów in addition to Katarzyna Klessa and myself. Katarzyna, Łukasz and me have also collaborated on a smaller project on an articulography study of infant- and adult-directed speech in the Maria Przybysz Piwko Laboratory.
My habilitation thesis was dedicated to the normative pronunciation of Polish vowels and lateral consonant /l/ using electromagnetic articulography and a newly developed microphone array. In recent years, I have been involved in several other projects where corpora of speech disorders were created. Such corpora are difficult and expensive to obtain and collect and can be tricky to distribute because of privacy issues, which is why both me and Katarzyna have been great supporters of the DELAD group since 2017, which is an initiative to establish a digital archive of disordered speech in a GDPR-compliant way and at secure repositories in the CLARIN infrastructure.
2. How did you get involved with the ACE CLARIN Knowledge Centre?
KK: The involvement with ACE started from the DELAD group and Henk van den Heuvel’s proposal to share with ACE some of the spoken language data collected by individual DELAD members. We discussed data sharing issues during our meetings in Cork and Utrecht.
AL: Yes, and it was the DELAD group who inspired me to share my PhD dissertation data related to the speech of hearing-impaired children with the help of Henk, Katarzyna and the ACE centre who took care of the technical aspects of preparing the data for publication.
3. Could you describe this corpus? Which kinds of speech disorders does your corpus contain? What was ACE’s involvement in the creation process?
AL: The corpus is based on read and elicited speech recordings that I collected as part of my doctoral dissertation around 15 year ago. The utterances were produced by hearing-impaired Polish children. Their pronunciation included various kinds of disorders specific for that group of speakers, for example voicing disorders. I described a number of those disorders in publications that are listed in the corpus public profile. Aside from the speech recordings, the original version of the data includes some basic metadata such as the information about the speakers’ age and gender. The elicited recordings come from a picture naming test. The orthographic transcription that is accessible along with the recordings is based on the prompts presented to the children during the recording sessions.
KK: We received practical guidelines on how to prepare Anita’s collection and how to properly describe its metadata for distribution. ACE helped us by quickly answering all kinds of questions and also with technical support before we could actually share the data. For example, the original material in Anita’s collection was saved as separate files. The utterance transcriptions (prompts) were available from audio file names. I extracted the transcriptions into text file format. Then ACE helped us combine the multiple isolated files into a single larger one which can be easier to handle in many cases.
4. What kind of research can be carried out with such a corpus?
KK: The corpus can be used for the analysis of certain paralinguistic components of the utterances, such as the acoustic and perceptual correlates of disordered speech, maybe even the speaker’s vocal effort or voice quality. It could also be useful in contrastive studies of speech intelligibility in the contexts of hearing vs. hearing-impaired speakers. I believe the material might also be a great sample for teaching and demonstration purposes for the students of phonetics, phonology and, first of all, speech pathology and speech therapy.
AL: The children whose disordered speech is presented in the corpus were educated using the Cued Speech method, so it would be interesting to compare the phonetic features of their pronunciation with the features of a similar group of hearing-impaired children educated with a different method. In addition to voicing disorders, which I described in my publications, there are noticeable disorders of speech fluency and the rhythm structure of speech, expressed by chanting and syllabifying utterances. Perhaps they are caused by the need of making gestures illustrating syllables and their particles in the Cued Speech method? A subjective, perceptual assessment of this group of children also suggests a non-standard realization of certain speech sounds – for instance, whereas a healthy speaker would pronounce certain sounds just as one consonant or vowel, a hearing-impaired child seems to pronounce them as a diphtong or even a more complex sound.
5. What kind of speech technologies can be developed on the basis of this corpus?
KK: The corpus is not very large, so considering the fact that contemporary solutions often require large amounts of data, it might not be sufficient by itself as a basis for speech technology development. But I still believe that it can become a valuable resource for this purpose. I imagine it could become a subset of a larger resource dedicated to testing or training tools for speech or speaker recognition. Another example might be educational applications where various speech samples are contrasted to demonstrate certain phonetic features of utterances. In addition, the recordings of isolated words in the corpus could possibly serve as perception test stimuli. And perception tests are often a practical tool used to evaluate various speech applications.
6. Which GDPR issues did you encounter when developing the corpus and how did ACE assist you on this matter?
KK: The DELAD team and ACE are experts on legal or administrative issues. We discussed GDPR in much detail during the 2019 DELAD/CLARIN workshop in Utrecht and while preparing the DELAD group publications. The GDPR issues that emerge when sharing disordered speech corpora are often even more problematic than those observed for corpora including speech of healthy persons. The main reason is that for some studies it is indispensable to include medical information and highly sensitive personal data. What is also important to note is that the data for our corpus were collected 15 year ago, which is long before GDPR and when the conditions for licensing and participant consent were quite different and less strict than they are today. So, for the current corpus, the data had to be anonymized and limited. The metadata in the corpus now includes speakers’ gender and age but without names or other personal information.
7. Aside from ACE, which helped you with the curation process, the corpus is also associated with another CLARIN Knowledge Centre, TalkBank, which will release the corpus. Why did you choose TalkBank as the depositing service? Why was it important for you to involve several K-centres in your work?
KK: TalkBank is one of the more popular on-line services dedicated specifically to disordered speech, so it offers a highly visible platform for reaching out to a very important group of possible users of the corpus. My participation in DELAD and discussions with researchers dealing with corpus-based studies of disordered speech have only reinforced my conviction of the importance of releasing the corpus through TalkBank. Additionally, the idea behind DELAD as well as CLARIN in general is to share as much data and disseminate it as broadly as possible for the sake of progress in research and technology. As a member of the programme board for the Polish CLARIN consortium, I have observed very intensive efforts towards not only developing the technological infrastructure, which of course is very important, but also towards reaching out to as many potentially interested parties as possible. Importantly, such efforts necessarily involve several institutions, each with its own specialization, which is also why it was so important to involve several K-Centres, as ACE helped us with the creation and documentation of the corpus, while TalkBank helped with its dissemination.
8. Do you plan to continue collaborating with CLARIN K-Centres in the future? When would you advise your fellow researchers to seek help from K-Centres?
KK: Yes, definitely! I see the collaboration with CLARIN K-Centres as a very successful and instructive experience. First of all, I am happy that the data have received a chance for a “second life” with the help of ACE and other centres. Secondly, I learnt a lot about the procedures and data flow which is remarkably interesting to me as a person involved in the creation and maintenance of several digital databases and repositories. I am therefore grateful for the opportunity to participate in the data curation process and hope for more. We will probably continue with some other Polish datasets of disordered speech or other types of recordings, either archival or maybe new ones. As for advice, I would say that if you need help or expertise related to language tools or data, CLARIN K-Centres are a great choice. The experience with ACE has showed me that it’s both easy to start working with a K-Centre and that you can expect a lot of support and encouragement.