One of the central aims of the CLARIN research infrastructure is to promote the use of language resources and of language technology in the humanities and social sciences. In order to facilitate this collaboration of computational linguists with scholars in digital humanities or social science disciplines, it is important to provide appropriate fora at major international conferences where successful collaborative research of this kind can be presented and where developers of computational linguistics applications can interact with digital humanities scholars.
With this motivation in mind, CLARIN-D, the national CLARIN consortium in Germany, organized a workshop on Language Technology for Digital Humanities (LT4DH) at COLING 2016, which took place in Osaka on December 11. COLING, the bi-annual conference in computational linguistics, has a long track record of promoting interdisciplinary research in computational linguistics and neighboring disciplines. The conference in Osaka, held from 11-16 December 2016, was the largest COLING conference to date with about 1,100 participants from around the world.
The LT4DH workshop drew more than 40 submissions and had more than 80 registered participants, attesting to the timeliness and the high level of research activities in this interdisciplinary field. The workshop program featured an invited keynote presentation by Jonas Kuhn of the CLARIN Centre at Stuttgart University, seven oral presentations, a poster slam and a poster session with 18 presentations and a software demonstration of the LAPPS language grid contributed by Nancy Ide of Vassar College.
In his keynote presentation, Jonas Kuhn presented four successful and diverse use cases of adapting language technology tools for on-going research in the digital humanities center at Stuttgart University. He went on to draw more general methodological conclusions about the research dynamics that unfold when language technologists and humanities scholars interact.
The oral presentations and the poster session covered a wide spectrum of digital humanities research, including the use of spoken language tools for the analysis of spoken free-verse poetry; the use of word embeddings and other machine learning techniques for spelling normalization; tracking the dynamics of word frequencies diachronically as well as across different genres; language resources and tools for multi-modal communication; the annotation and alignment of transcriptions for diachronic corpora of Japanese; the use of language technology tools for named-entity tagging of Latin texts; linguistic annotation tools for a large-scale historical Arabic corpus; web-based tools for manual annotation of text corpora; workflow platforms for automatic annotation of text corpora; language technology for low-resourced languages and for the construction and querying of diachronic computational lexica; using TEI for research on textbooks; integration of OCR and machine translation of historical dcouments. The full range results of the LT4DH workshop can be found in the on-line proceedings, co-edited by the workshop organizers and available at:
The high quality of the papers and posters presented at the workshop has prompted an offer from Nicoletta Calzolari and Nancy Ide, the editors of the Journal of Language Resources and Evaluation, to the organizers of the LT4DH workshop to guest edit a special issue, or possibly a double issue, of the journal on the theme of the workshop. This offer has been accepted, with a call for submissions to be issued in early 2017. The call will invite all presenters at the LH4DH workshop to submit expanded versions of their submitted papers, but will also be open to additional researchers working at the interface of computational linguistics, digital humanities, and social science disciplines.
Erhard Hinrichs, Marie Hinrichs, and Thorsten Trippel
Co-organizers of the LT4DH COLING 2016 workshop