How to use TEI for the annotation of CMC and social media resources: a practical introduction
4 October 2017, 14:30–18:00, Eurac Research, Italy
The goal of the event is to give a practical introduction into the annotation of language data from genres of computer-mediated communication (CMC) and social media using the formats of the Text Encoding Initiative (TEI). In an introductory section participants will learn about the general architecture of TEI encoding schemas and about rules for the creation of so-called customizations which allow for extending the use of TEI with textual genres and in domains which are not yet covered by the current version of the TEI guidelines. Examples for TEI customizations are the representation schemas for CMC/social media genres developed in the TEI special interest group “computer-mediated communication”.
In a hands-on session, participants will learn how to use these customizations to create a basic TEI representation for their own CMC/social media data. For this purpose participants may bring samples from their own data/corpora or select a sample from collections of Wikipedia talk pages in several languages prepared by the instructors. Format specifications for participants’ own data will be announced in advance. For the hands-on session, participants will be asked to bring a laptop computer with WLAN and a full or trial license of the oXygen XML editor.
The tutorial is funded as a CLARIN User Involvement Event and will be held in association with the 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17), held Oct 3rd & 4th @ Eurac Resarch, Italy.
There will be no registration fee for the workshop. Registration will be possible as of June 30, 2017 via https://cmc-corpora2017.eurac.edu/registration/. Participants who also want to attend the main conference can register for the workshop together with their conference registration. The registration fee for the main conference is 75 EUR and includes conference materials, coffee breaks, and lunch.
The tutorial is held by:
- Harald Lüngen (Institute for the German Language, Mannheim, Germany)
- Michael Beißwenger (Universität Duisburg Essen, Germany)
- Laura Herzberg (University of Mannheim, Germany)
The workshop is organized by:
- Michael Beißwenger (University of Duisburg-Essen, Germany)
- Ciara R. Wigham (Université Clermont Auvergne, France)
- Egon W. Stemle (Eurac Research, Italy)