Language resources and technologies for processing and linking historical documents and archives- Deploying Linked Open Data in Cultural Heritage
Recently, the collaboration between the NLP community and the specialists in various areas of the Humanities has become more efficient and fruitful due to the common aim of exploring and preserving cultural heritage data. It is worth mentioning the efforts made during the digitisation campaigns in the last years and within a series of initiatives in the Digital Humanities, especially in making Old Manuscripts available through Digital Libraries.
Having in mind the number of contemporary languages and their historical variants, it is practically impossible to develop brand new language resources and tools for processing older texts. Therefore, the real challenge is to adapt existing language resources and tools, as well as to provide (where necessary) training material in the form of corpora or lexicons for a certain period of time in history.
Another issue regarding historical documents is their usage after they are stored in digital libraries. Historical documents are not only browsed but together with adequate tools they may serve as basis for re-interpretation of historical facts, discovery of new connections, causal relations between events etc. In order to be able to make such analysis, historical documents should be linked among themselves, on the one hand, and with modern knowledge bases, on the other. Activities in the area of Linked Open Data (LOD) play a major role in this respect.
A particular type of historical documents are the newspaper collections and archives. Newspapers reflect what is going on in society, and constitute a rich data collection for many types of humanities research, ranging from history, political and social sciences to linguistics, both synchronic and diachronic, and both national and cross-national. They represent an important resource for analysis of changes at all levels which emerged in Europe with begin of the industrialization period. The aim of this workshop is to bring together researchers working in the interdisciplinary domain of cultural heritage, specialists in natural language and speech processing working with less-resourced languages as well as key players among Linked Open Data initiatives. They are expected to analyse problems and brainstorm solutions in the automatic analysis of historical documents, uni- or multimedia, their deep annotation and interlinking.
A special track will be focused on the digital acquisition and analysis of historical newspaper collections The workshop builds on successful previous initiatives in this domain at LREC 2010, 2012, and RANLP 2011.
We also intend to encourage the collection of resources and tools and their metadata in this field and thus to contribute to a special section on resource sharing organised by the main LREC conference.
The website for the event, including the programme, can be found at http://www.c-phil.uni-hamburg.de/view/Main/LTforHisLangArhives2014,