Re-evaluating Child Language Assessment Measures for Research and Clinical Use

The Project

CLASP – the Child Language ASsessment Project – sets out to re-evaluate the most commonly used measures of children’s language sample analysis. CLASP makes use of datasets at the CLARIN Knowledge Centre TalkBank, which provides integrated repositories for spoken language data in more than 14 research domains. Additionally, the project will use the CHILDES and FluencyBank databases to assess and develop new algorithms for the bias-free assessment of speech samples from children who speak non-mainstream dialects of American English. In doing so, CLASP hopes to provide a stronger evidence base for child language assessment for both research and clinical purposes and reduce biases in assessment. This, in turn, will support clinicians in mapping out the most effective treatments for children with language disorders.

The five-year project is spearheaded by Dr Nan Bernstein Ratner, an applied developmental psycholinguist whose research primarily centres on child language and fluency development and disorder. ‘We have two major concerns’, says Bernstein Ratner. ‘One question is, are any of the measures that we’ve adopted for child language assessment good? But we’re also trying to answer the question whether or not children who speak other dialects, such as African American English, could be misdiagnosed using any of these measures.’ Currently, no large scale normative scores are available for any of the most commonly used language development measures. Bigger, more diverse data are needed in order to evaluate the effectiveness of the different methods.

Towards Inclusive Assessment

The two most frequently used assessment measures are mean length of utterance (MLU) and type-token ratio (TTR) to measure vocabulary diversity. Other measures, such as Developmental Sentence Scoring (DSS) and the Index of Productive Syntax (IPSyn) are very detailed, but tedious to do by hand; thus, few clinicians use them. CLASP’s findings so far suggest that, in isolation, some measures have only limited value, but that a combination is effective. Bernstein Ratner says: ‘MLU turns out to be very good at distinguishing between children with problems and children without problems. It’s a good filter.’ In contrast, CLASP has added to a growing literature suggesting that TTR should not be used in assessment (see article). It does not change as children mature, and children with known disorders actually outscore typical children on TTR. Other measures, such as DSS and IPSyn, are less good filters, but better for helping to plan therapy, because they show which aspects of the language the child does not appear to use, or does not appear to use properly. This means that a clinician can quickly see what the child needs help with.

Measures most commonly used to assess children’s language development are likely to be biased against children who do not speak mainstream American English. CLASP aims to work against such biases, but progress has been slowed as there is very little data to address this question. To resolve this, the CLASP team has acquired additional data and is currently re-transcribing them, which will help to create a robust database with which to explore the question of bias when assessing preschool age children in the US.

‘As automatic speech recognition improves, this will give us more accurate transcription […] Then we can perform more accurate and informative assessments. They may not realise that they're working together fully, but industry and research are working in tandem with one another.’

Nan Bernstein Ratner

During additional data acquisition, CLASP has found that in order to measure children’s IPSyn scores, a much smaller sample than previously thought can be used to arrive at a reliable result, which may significantly increase clinical uptake of the instrument. Bernstein Ratner says: ‘Clinicians hesitated to use this particular routine, especially if they don’t use computers to assess language. But now we know that half the number of words is enough. Whether or not you use a computer, getting and transcribing 50 utterances rather than 100 will take literally half as long. We think that has great clinical impact.’

While this makes the manual approach easier, the CLASP team is ultimately of the view that computer-assisted assessment will further improve the outcome for children. ‘Often it takes months for children with developmental language disorders to get therapy because the assessment process takes so long,’ says Bernstein Ratner. ‘The goal of CLASP is to get clinicians through the process of assessment quickly and accurately, so that they can get on with helping the kids.’ In Bernstein Ratner’s view, computer-assisted analysis of language samples can be more helpful to clients. ‘[…] This is important work clinically, [and] this is going to eventually be the way we go.'

Distinguishing Difference from Disorder

Improving existing data includes back-annotating a number of different datasets, for instance by adding information as to whether the child’s language includes actual errors or perhaps a language variety, although that can be difficult for samples of very young children.

In addition, the team is working closely with Barbara Zurer Pearson at the University of Massachusetts Amherst, who is one of the authors of a dialect sensitive evaluation tool called the DELV, the Diagnostic Evaluation of Language Variation. Pearson has been helping the CLASP team develop annotations that are systematic, and can be computed. In this way, they can be sure to be marking all children's speech coming in for a particular set of features in the future.

Data that is annotated in this way could be used to provide a ‘warning shot’ to clinicians when they run diagnostic analyses, which may be particularly valuable if clinicians are not aware that a child's pattern might reflect a language difference and not a disorder.

*FluencyBank is part of TalkBank, a CLARIN K-centre.*

Bernstein Ratner says: ‘We're trying to come up with a feature that will look at a child's transcript. For certain things, for instance, absence of the third person singular marker in the present tense, lack of the auxiliary verb in progressive phrases, like “He running” for “He's running”, or the use of some dialect-specific forms, we're trying to develop what we're calling a dialect detector. Something that will pop up if a clinician doesn't annotate, because he or she doesn't know to annotate this feature of a dialect. If they just type what they hear, we're hoping to come up with an alert that essentially says: Have you considered that the child in front of you may not speak your version of American English, but that it's not in itself problematic?’

The dialect detector mechanism is envisaged to become part of the other big utility programs, such as CLAN’s KidEval and FluCalc routines, that can be used by clinicians and researchers, without any need for coding. Bernstein Ratner says: ‘Even for technophobic people, this is accessible. Our goal is to have a system where people just type what they hear, and then push a couple of buttons, and then the work is done for them.’

Thus far, citations to the CHILDES database tend to refer to basic research articles in journals. Bernstein Ratner sees CLASP as a bridge between basic research and real-world application: the project uses data initially collected for basic research, and turns them into a clinical asset.

Learning from Dis-fluencies

In addition to her own research and teaching, Bernstein Ratner also runs FluencyBank, a large, shared TalkBank database of spoken language resources for clinicians and researchers whose work focuses on stuttering and general disfluency in speech. FluencyBank’s annotations follow a uniform standard (until recently, each lab or practice made up their own), all data is open access, and easy-to-use tools are available to analyse and compare transcripts. The site is frequently used by researchers and clinicians, and the clinical impact is well documented.

In recent years, FluencyBank data is also increasingly being used by the automatic speech recognition (ASR) industry in order to train voice assistants to recognise speech by people who stutter. Bernstein Ratner says: ‘The one thing I've learned over the years is that we don't see the potential of our data at the time. We keep track of who's using the banks by doing literature and citation reviews and alerts. In the past five years, the stuttering data in FluencyBank has been used in more than a dozen articles by people working in automatic speech recognition, and the number keeps going up. We never saw that one coming.’

'Wide swathes of the tech community have been bringing down samples of stuttered speech from FluencyBank and trying to train voice assistants like Alexa and Google and Siri, to recognise speech spoken by people who stutter.’

Nan Bernstein Ratner

As the data are easily available, shareable and follow a uniform standard, users know how to deal with them. Moreover, they are not only being used to train software systems to recognise people who stutter, but also as a way to identify speech problems associated with dementia, for example: ‘If you can teach a computer to look past something, you can also teach it to pay attention to something, because they're really flip sides of the same coin. We see a number of emerging research publications that are using TalkBank materials, such as FluencyBank, Aphasia Bank and CHILDES, to actually measure things – for instance, trying to catch early signs of dementia by looking at the available data. It's really exciting, […] it's a great use of the data.’

Views on CLARIN and Open Science

‘I think there is more and more of a push for shareable, interchangeable and interoperable data. Without rules, without interoperability, what you wind up with is a digital trash can. Basically, there's no way to go through the data and understand any of it. It's there, but it's uninterpretable. And that's why these efforts to have communication among agencies involved in these initiatives is so critical. We really benefit from what CLARIN is doing, and vice versa. We’re all engaged in the same conceptual space.’

Contributors

Dr Nan Bernstein Ratner, Professor, Department of Hearing and Speech Sciences, University of Maryland, College Park