CLARIN Café on Text+: A New Research Data Initiative in Germany

, -

General Information

This Café is organised by Erhard Hinrichs (University of Tübingen), Thorsten Trippel  (IDS Mannheim), and Andreas Witt (IDS Mannheim). The café will present the new German Text+ research infrastructure, dedicated to  text- and language-based research data. The scientific spokesperson of Text+ is Prof. Hinrichs, formerly and for a long time member of the CLARIN (see blog post), and this first café of the year will be an opportunity to reflect on the past and future of CLARIN in Germany. 

Date: 24 January 2022

Time: 14:00-16:00 (CET)

Venue: CLARIN virtual Zoom meeting

Twitter hashtag: #CLARINcafe #TextPlus_NFDI 

A full overview of the café sessions scheduled can be found on the CLARIN Café page.


Introduction to Text+ (Erhard Hinrichs)

Text+ is part of the German national research data infrastrastructure (German: Nationale Forschungsdateninfrastruktur, or short: NFDI). Within the NFDI, Text+ focuses on language and text data and will initially concentrate on digital collections, lexical resources and editions. These are of high relevance for all language and text-based disciplines, especially for linguistics, literary studies, philosophy, classical philology, anthropology, non-European cultures and languages, as well as language and text-based research in the social, economic, political and historical sciences.

The first presentation will offer  an introduction to Text+ and provide the context for additional data and services that are also part of the European SSH-landscape of research infrastructures.

Involving the Text+ Community: SCCs & OCC (Thorsten Trippel)

Text+ has an elaborate structure involving the communities of interest in its developments. The Scientific Coordination Committees (SCCs) of Text+ are a platform for learned societies and experts to evaluate the ongoing development of Text+ and allow the extension of the portfolio of data and services provided by Text+. The SCCs may recommend partner projects to be integrated into the Text+ infrastructure, with additional funding provided from within Text+. These partner projects are available according to a defined review cycle.

Using a Distributed Technical Infrastructure (Marie Hinrichs)

Text+ builds on infrastructure developed in other contexts. As a distributed network of data and tools, one of these tools is a distributed architecture and environment to create processing pipelines where the users select the tools available from the infrastructure according to their needs. We will showcase WebLicht, which provides a web service based infrastructure for multiple tools and languages and allows processing of texts provided by users.

Standards Information System (SIS) (Eliza Margaretha Illig)

In a FAIR infrastructure, interoperability and reusability depends on the use of recommended standards and formats. Hence, standards play a key role in this endeavour, but for users and data providers these standards are often opaque and hard to access if they are not already part of the infrastructure community. This presentation will address a tool that assists users and infrastructure providers to maintain an overview of standards, especially with regard to international standards such as those provided by ISO.

Sustainability by Forming a National Legal Entity (Andreas Witt)

In the field of research, funding of infrastructure often is organised by means of individual projects that are designed to develop data and services. As projects are intended for a limited timespan, sustainability of services is at stake. In Germany, various projects joined to create a sustainability perspective by forming their own (national) legal entity that forms the organisational umbrella for the national nodes of European infrastructures and builds the bridge to individual providers of tools, data, and services. We will present this organisational architecture which has been formed in Germany.

How to Join

You can register at this link, you will receive the Zoom meeting link on the day before the event.


14:00 - 14:05 - Opening - Franciska de Jong and Francesca Frontini (CLARIN )

14:05 - 14:25 - Introduction to Text+ - Erhard Hinrichs

14:25 - 14:40 - Involving the Text+ Communities: SCCs & OCC - Thorsten Trippel 

14:40 - 14:55 - Using a distributed technical infrastructure - Marie Hinrichs

14:55 - 15:10 - The CLARIN Standards Information System (SIS) - Eliza Margaretha Illig 

15:10 - 15:25 - Sustainability by forming a national legal entity - Andreas Witt

15:25 - 15:30 - Break for refilling coffee or tee 

15:30 - 16:00 - Questions and answers. 


Erhard Hinrichs: scientific speaker of the Text+ consortium in Germany; former member of CLARIN’s NCF forum and board of directors

Andreas Witt: outgoing member of the board of directors of the CLARIN ERIC and new national coordinator of CLARIN in Germany. 

Marie Hinrichs: developer and major contributor to the WebLICHT pipeline and orchestration engine for language processing tools. 

Eliza Margaretha Illig: developer working on accessing data. Using a database of standards, a major contribution was a search service on standards relevant for language resources

Thorsten Trippel: research data specialist reaching out to the communities. Finding a solution to data management issues of scholars in the SSH is one of his core interests. 


Recordings, Slides and Blog