Outcomes of the fifth DELAD Workshop

Submitted by Elisa Gorgaini on 17 February 2021

Blog post written by Henk van den Heuvel about the online DELAD workshop held on 27-28 January 2021 organised with the support of CLARIN ERIC and SSHOC.


What is DELAD? DELAD stands for Database Enterprise for Language And speech Disorders, and is also Swedish for SHARED. DELAD is an initiative to share corpora of speech of individuals with communication disorders (CSD) among researchers.

Even though originally planned to take place in Helsinki in June 2020, the fifth DELAD workshop eventually took place as an online event due to the covid pandemic. So, the venue was a zoom room that CLARIN kindly offered and professionally hosted. It was truly like feeling at home! (Well, in fact, it was at home) 


What was the workshop about?

This workshop was the fifth of a series that started in 2015, and it was the third organized under the CLARIN umbrella. About 30 participants registered and attended the meeting The attendants came from all over Europe amongst others from the Netherlands, Finland, Ireland, Poland, Italy, France, Estonia and the UK, with backgrounds in language and speech pathology, linguistics and phonetics, speech technology, data archiving, ICT, and law. This is exactly the mix that makes DELAD attractive and suited for discussing and sharing CSD.

The workshop was organized by the DELAD steering group together with Esther Hoorn from. the CLARIN Legal and Ethical Issues Committee (CLIC) 

Goal

The aim of this workshop was to:

  • Extend DELAD network with new participants;  
  • Explore with the participants the potential of the new CLARIN K-Centre on Atypical Communication Expertise (ACE) for hosting CSD of DELAD members;
  • Exchange deeper insights on Data Protection Impact Assessments (DPIAs); 
  • Discuss voice conversion as a means to pseudonymize speech.

An overview of the workshop can be found here

Day 1

On the first day the workshop started  with four presentations:

  • “Croatian written and spoken corpora of speech with communication disorders” (Gordana Hržica)
  • “Oral and written documents of mental health patients” (Silvia Calamai & Rosalba Nodari)
  • “Using electromagnetic articulography for the purpose of studying speaking styles and speech disorders” (Katarzyna Klessa, Anita Lorenc, & Łukasz Mik)
  • “Parkinson’s disease: A French corpus collected using MonPaGe protocol” (Veronique Delvaux)

Apart from the discussions about the research itself considerations were made for sharing the resulting CSD. Especially those of Calamai & Nodari and Delvaux, which were still looking for a suitable shelter for the corpus where DELAD could play a role.

The afternoon session was exactly devoted to that topic. How can DELAD in cooperation the CLARIN Knowledge Centre for Atypical Communication Expertise assist in realizing the GDPR compliant access to such CSD?

  • “Help from DELAD and CLARIN Centre for Atypical Communication Expertise (ACE) in sharing CSD” (Henk van den Heuvel)
  • “How to access & deposit existing data at CLARIN centres, profiles of metadata, licenses” (Paul Trilsbeek)

Day 2

In the morning session of the second day of the workshop a role play was scheduled as devised by Esther Hoorn and her team. The role play led to lively discussion addressing various aspects of the ingredients needed to be taken into account when documenting your considerations in a Data Protection Impact Assessment (DPIA). Such a game is an entertaining way to touch upon various aspects that are relevant when sharing your CSD, leading to various eye openers in the discussions! Here, the breakout rooms in zoom served well to split the group into two.

In the afternoon, Rob van Son gave a keynote talk on “Use voice conversion for pseudonymisation?”. The intriguing idea behind the method he presented is on the one hand to retain linguistic & paralinguistic features of the speech, and on the other hand to remove the identity of the speaker. If that could be done successfully then the speech could count as anonymised and won’t be subject to  GDPR. Rob van Son presented results from the Voice Privacy Challenge 2020 and concluded that speaker identifying information can be removed from speech, but also noticed that all systems had issues with naturalness and intelligibility. A relevant issue for DELAD of course is if pathological speech, e.g. dysarthric speech, will still be studied after pseudonymisation. And for this further study evidence is needed. During the workshop several participants expressed interest in a case study for part of their material. 


Lessons learnt, points taken

It was great to have the support of CLARIN staff in organizing an online workshop like this via Zoom. CLARIN took care of the zoom addresses and mailings, and the breakout room for coffee breaks and the role play. This relieved us as workshop organizers from a serious organizational burden, so that we could concentrate on the content of the workshop.

We were satisfied with the long breaks for the lunch of two hours. It gave participants the opportunity to digest the content of the morning session, and do something (relaxing) in between. We also stopped quite early (at 16:00) to avoid our participants becoming “Zoombies” at the end of the day. This experience let us think about how to take advantage of online meetings in the future in addition to face-to-face meetings.

We were happy with the new researchers that subscribed to the workshop. Their presentations were very interesting and diverse. The workshop in general provided a good mix of research and data oriented presentations and presentations focusing on the legal and ICT  support that DELAD can offer to share such data. The role play on Data Impact Protection Assessment (DPIA) was a highly valued interactive aspect of the program. 


Action points

As relevant action points for our DELAD network we have identified:

  • Set up a number of case studies on pseudonymization of (multilingual) pathological speech data and organize a workshop around this; 
  • Look into ways to promote DELAD and its benefits for sharing CSD on national levels. Mentioned were a slide desk for promotion, social media campaign, folders);
  • Partners are interested in curating and sharing their corpora via DELAD;
  • This will yield further case studies on sharing CSD. DPIAs should be integrated;
  • Share Reference DPIA’s to the community via DELAD website;
  • Share consent form templates via DELAD website, how to structure them, relevant aspects (must haves), checklists with examples.
  • Topics for next workshops:
    • Progress on pseudonymization 
    • Sharing data via remote secure access option
    • Experiences on sharing datasets (make it more concrete)

All in all the workshop was very inspiring (again!).  We noticed participants profited from the collaboration spirit and collegial environment in DELAD. 


Details

  • If you are interested in  becoming a member of DELAD  you can subscribe here.  

This event was supported by CLARIN and by the SSHOC Project (Grant Agreement 823782 under H2020)