CMU-TalkBank to sign Third Party agreement with CLARIN

Submitted by Leon Wessels on 14 March 2017

We are happy to announce that CMU-TalkBank has officially joined CLARIN as a Third Party. CMU-TalkBank ( attained CLARIN B-Centre status in 2015 and has recently become a CLARIN K-centre as well, and now there is the committment for an association at the University level. This commitment is supported by the University Libraries, the Language Technology Institute, and the Department of Psychology. CMU-TalkBank eventually hopes to extend their association to include other institutions in the United States.

TalkBank provides the largest open-access repository for spoken language data.  The component databases in TalkBank include CHILDES (Child Language Data Exchange System), SamtaleBank (from DK-CLARIN), AphasiaBank, HomeBank (daylong audio recordings in the home), BilingBank, SLABank (second language learning), FluencyBank (stuttering), CABank (Conversational Analysis), ClassBank (classroom videos), LangBank (Latin and Historical German), and several additional smaller corpus banks. Work on expanding these and other corpora is currently supported by three grants from NIH and two grants from NSF.  All TalkBank corpora use the CHAT format, facilitating easy analysis through the CLAN and ANNIS programs, as well as interoperability with ELAN, WaveSurfer, Praat, Phon, SALT, and DataVyu.  CHAT format is validated through the talkbank.xsl schema.  

TalkBank provides morphological taggers and dependency parsers for twelve languages, as well as a variety of supports for automated clinical diagnosis of language disorders and studies in clinical linguistics. TalkBank also provides resources for second language instruction development at Current research emphases include the use of speech technology to link transcripts to media, harvesting data from teletherapy and second-language tutors, and development of open-source tools for daylong audio and video recording.

As a CLARIN B- and K-centre, CMU-TalkBank offers support for all types of analysis of spoken language corpora. TalkBank data and tools have been used in 8000 published articles and the TalkBank websites have received 6 million hits.  CMU-TalkBank is happy to offer the CLARIN community access to all of these resources, as well as additional resources from the Language Technologies Institute. In the future, CMU-TalkBank hopes to achieve further integration of TalkBank within CLARIN, as well as the inclusion or mirroring of corpora from the national CLARIN repositories into TalkBank.

To find out more about CMU-TalkBank, you can watch Brian MacWhinney's presentation on our YouTube channel.