CLARIN is working on a general architecture to establish a research infrastructure. At first glance it is based on a layered setup with each layer offering web services as is indicated in the figure. It consists of

  • a high capacity network layer provided by the European GEANT initiative strong enough to transfer even high resolution video streams

  • language resource and technology repositories that archive language resources and offer language technology

  • service registries that allow to register all sorts of services and expertise centers that help users to make use of the services

  • an integration layer that can best be described by Grid type of services and an interoperability and access layer offering services that will enable Semantic Web type of services

  • an application layer that makes use of all services, registries and repositories to allow users to tackle the grand challenges in the humanities including a semantically rich humanities domain

It has to be ensured that this architecture is open to include new resources and
technology and that it can scale up.

Repositories & Centers
CLARIN will be built upon a network of strong centers that can offer stable and persistent repositories and services for resources and technology and that will register all services for discovery purposes. It will offer expertise centers that can help the different user communities. To a large extent these repositories and centers already exist, but a European and national formation process has to shape the landsacape to identify those that are strong enough to overcome fragmentation and will receive enough support.

Grid Services
The Grid integration services take care that trusted servers and services can communicate with each other and that the repositories form an integrated federation. This federation allows users having a domain wide identity for authentication purposes, exchange of user credentials for authorization purposes, navigating in a joint metadata domain where the resources are identified by Unique Resource Identifiers. Such a Grid Layer for language resources is currently being worked out in the DAM-LR project, i.e., its experiences can be re-used and its solutions have to be scaled up.

Semantic Web Enabling Services
This layer offers seamless access to all resources and tools by offering broker services, powerful search engines, conversion and interoperability infrastructures. The ability to easily describe new formats and to easily include new terminologies will play a key role. Many components are available, yet they have to be adapted to make them usable with the architecture.

Applications
Some web-based applications have to be created as demonstrators, however, it is the expectation that researchers and other interested persons – in particular young people and increasingly often SMEs – will make use of the opportunities and easily create new applications based on the Web Services that are made available. The infrastructure including also new language resources (monolingual resources according to the BLARK matrix for example, multilingual resources, multimodal resources, etc) will allow humanities researchers and in particular linguists to tackle the grand challenges of cross-cultural and cross-lingual work and to create a semantically rich domain.