Language Resources and Technology

By language resources and its technology we mean all knowledge sources based on language (written or spoken) and the tools to carry out operations on such language material. The following list may give an impression of the scope of what CLARIN language resources and technologies are:

 

  • Texts of all sorts which can be digitized medieval sources, web-sites, newspapers, digitized books etc

  • Multimedia recordings (audio/video) and time series recorded during communication (data glove, eye tracking, etc)

  • Various types of manually or automatically created annotations on texts, media streams etc

  • Tools such as aligners, speech recognizers, tokenizers, part-of-speech taggers, parsers, manual annotators, viewers etc

  • Various types of knowledge sources encapsulating knowledge about resources and languages such as metadata descriptions, GIS, lexica, concept registries, ontologies, etc

Value
The creation of language resources and the development of suitable language technology to make use of them is rather costly. The establishment of a network of skills to built up and maintain resources and tools is a time consuming and costly process which is due to the fact that languages are highly non-linear systems and amongst the most complicated nature has created. Languages are unique results of evolutionary processes and thus encode cultural treasures and identity. Loosing this treasure means taking the risk of loosing identity.

Language Resource and Technology Community
The language resource community can be characterized by those scholars, engineers and computer scientists who have a deep understanding about all aspects related with language resources and its technology, i.e., creating, archiving, processing and servicing them. Due to its nature this community is part of the humanities, but Human Language Technology has strong links to computer sciences, and the aims of this initiative have much in common with the aims of the Semantic Web community. What distinguishes the language resource community from the other disciplines in the humanities is its way of looking at language resources, its methodologies and its traditions. While other humanities are working on the content of the resources and apply specific tools tailored to their research interests, the language resource community is interested in linguistic content, structures, formal semantics etc.