The CLARIN Resource Families initiative provides a user-friendly overview of the available language resources in the CLARIN infrastructure for researchers from digital humanities, social sciences and human language technologies.
This month we highlight the Computer-mediated communication (CMC) corpora. CMC constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp and e-mail. These corpora are interesting for a wide range of research fields, such as language variation, pragmatics, media and communication studies, etc.
The CLARIN infrastructure offers 13 CMC corpora - most are available for Slovenian, but also for Czech, Dutch, Estonian, Finnish, French, German and Lithuanian. Most of the corpora are richly tagged and available under public licences.