The K-Centre for Atypical Communication Expertise (ACE for short) is run by the Centre for Language and Speech Technology (CLST) at Radboud University. The mission of ACE is to support researchers engaged in investigating what can be characterised as atypical communication, which is an umbrella term used here to denote language use by second language learners, people with language disorders or those suffering from language disabilities, but also to languages that pose particularly difficult issues for analysis, such as sign languages and languages spoken in a multilingual context. It involves multiple modalities, such as text, speech, sign, gesture, and encompasses different developmental stages. The target audience for ACE includes linguists, psychologists, neuroscientists, computer scientists, speech and language therapists and education specialists.
ACE offers the following services through its website:
Information and guidelines about:
- consent (forms)
- hosting corpora and datasets containing atypical communication
- where to find corpora and datasets containing atypical communication
- Helpdesk/consultancy for questions on the above topics
- Technical assistance for designing, creating, annotating, formatting and metadating resources of atypical communication
- Outreach: publications, workshops contributions, etc.
Data originating in a context of atypical communication are particularly sensitive as regards privacy and ethical issues. While collecting, storing, processing and using such data, researchers are bound by strict rules and procedural requirements imposed by ethical committees and the GDPR. For hosting data and corpora for atypical communication and making these accessible in a FAIR- and GDPR-compliant manner, CLST has established close collaboration with The Language Archive (TLA) at the Max Planck Institute for Psycholinguistics (MPI) in Nijmegen. TLA, which is a CLARIN B-Centre, offers storage of sensitive data (audio, video and transcripts) in a CMDI-supported repository that provides strong authentication procedures, layered access to data, and persistent identification. They focus on collecting spoken and signed language materials in audio and video form along with transcriptions, (speech) analyses, (linguistic) annotations and other types of relevant material such as photos and accompanying notes.
For corpora of speech from people with language disorders, ACE works closely together with the DELAD initiative, whose goal is to facilitate the sharing of disordered speech corpora among researchers in a GDPR-compliant manner. Especially for these types of resources there is close collaboration with Carnegie Mellon University Talkbank / Clinical banks. Our collaboration makes it possible for our corpora and datasets to be registered at Talkbank and obtain its metadata and landing page at the Talkbank website. By contrast, the storage and authentication of access to the 'raw' data (commonly audio and video) data is handled at TLA.
At all stages, appropriate measures must be in place so as to prevent unwanted disclosure. In some cases, this requires that the original data remain stored in a dark archive so that they are not accessible to users and cannot be copied or distributed in any form. To this end, ACE provides through its website a helpdesk to advise resource owners and users on how they can preserve sensitive data in a safe manner, from the point where the raw data come into existence up to the moment where the data and information obtained from it are shared with others. Usually, assistance in designing and collecting corpora containing atypical communication with consent forms that are GDPR-proof is considered of great value, as are references to available guidelines and tools for annotating such resources. How to make the resources accessible and share them with other researchers is another issue for which special expertise is requested.
Atypical communication data are also special when it comes to the methods and tools for processing and using them, if only because specific requirements apply in the light of the GDPR. Often guidelines and tools that have been developed for standard data cannot be used for atypical communication data or require adaptations or special settings; in some other cases dedicated tools are available. For example automatic speech recognisers which are used to generate transcriptions from audio files have a far lower performance when used for speech of language learners or of people suffering from dysarthria. ACE is thereby well-positioned to inform researchers who want to work with language development data, data of adults and children with speech disorders, or users of sign language on the availability of such tools and guidelines.
Van den Heuvel, H., Oostdijk, N., Rowland, C., Trilsbeek, P. (2020). The CLARIN Knowledge Centre for Atypical Communication Expertise. Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC2020).
Blog post written by Henk van den Heuvel, edited by Darja Fišer and Jakob Lenardič.