A Recap on the CLARIN Café on Text+: A New Research Data Initiative in Germany

Submitted by e.gorgaini@uu.nl on 8 February 2022


The CLARIN Café on Text+: A New Research Data Initiative in Germany was a virtual event that took place on 24 January 2022. Organised by Erhard Hinrichs (University of Tübingen), Thorsten Trippel (IDS Mannheim) and Andreas Witt (IDS Mannheim), the café presented the new German Text+ research infrastructure, dedicated to text- and language-based research data. More than eighty participants from various countries and organisations participated to hear about the recent national developments in Germany and the progress in building a sustainable national consortium similar to what European Research Infrastructure Consortia are doing on a European scale. 

After a short introduction by Francesca Frontini and Franciska de Jong, the Text+ team presented the new research data infrastructure. Text+ speakers were Erhard Hinrichs, Thorsten Trippel, Marie Hinrichs, Eliza Margaretha Illig and Andreas Witt. Erhard Hinrichs is the scientific speaker of the Text+ consortium in Germany. Andreas Witt is an outgoing member of the CLARIN Board of Directors of and new national coordinator of CLARIN in Germany. Marie Hinrichs is a developer and major contributor to the WebLicht pipeline and orchestration engine for language processing tools. Eliza Margaretha Illig is a developer working on accessing data. Using a database of standards, her major recent contribution was a search service on standards relevant for language resources. Thorsten Trippel is a research data specialist reaching out to the communities. The large number of participants illustrates that this topic is also of interest to other CLARIN national initiatives, as well as  researchers from other research infrastructure communities. 

Watch the recording of the opening of the CLARIN Café on Text + on the CLARIN YouTube channel.


Text+: A New Research Data Initiative in Germany

The sustainability of research infrastructures is a major challenge for everyone involved in research data management, research software and service development and research quality assurance. The German National Research Data Infrastructure (Nationale Forschungsdateninfrastruktur, NFDI) forms a national framework to integrate the activities from various disciplines and contributors. Within this framework, up to thirty research-driven consortia will be funded, of which nineteen have already been approved. The field of humanities is currently represented by two consortia: NFDI4Culture and Text+. 

Within the NFDI, Text+ focuses on language data and will initially concentrate on digital collections, lexical resources and editions. These are of high relevance for all language and text-based disciplines, especially for linguistics, literary studies, philosophy, classical philology, anthropology, non-European cultures and languages, as well as language and text-based research in the social, economic, political and historical sciences.

Text+ integrates existing German national consortia of European infrastructures , i.e. CLARIN-D and DARIAH-DE, and includes new stakeholders, such as additional research universities, libraries, archives, and academies. 

As a distributed infrastructure, Text+ supports researchers in every phase of the research data life cycle, addressing born digital data as well as data at different stages of digitisation and integration into a digital research infrastructure. This general introduction was the topic of the first contribution to this café, presented by Erhard Hinrichs. 

Watch the recording of the introduction to Text + on the CLARIN YouTube channel.

As a research-driven infrastructure, the governance of Text+ ensures a strong integration of all research communities of interest. In his presentation, Thorsten Trippel introduced a crucial mechanism to reach this aim: the Scientific Coordination Committees (SCCs) for each data domain of Text+ and the Operations Coordination Committee (OCC) for the area of infrastructure and operations. Although the consortia members of Text+ are from the research communities, these committees constitute a structure in which the communities are represented by delegates from academic societies. The SCCs and the OCC not only have the role of an advisory board, but influence project funding within Text+. For this purpose, Text+ has reserved a significant amount in its budget for the integration of new partners, additional data, services and tools into the infrastructure. 

Watch the recording of the presentation Involving the Text+ Communities: SCCs & OCC on the CLARIN YouTube channel.

In her presentation, Marie Hinrichs introduced a CLARIN tool, now integrated into Text+, which facilitates easy integration of a wide range of annotation services within a service-oriented architecture. More specifically, she demonstrated the WebLicht environment for creating processing pipelines. Such tool chains allow users to select annotation tools that are needed for their research task at hand. Another service, already developed within CLARIN, was introduced by Eliza Margaretha Illig. She presented a tool that assists users and infrastructure providers with a comprehensive overview of standards, especially with regard to international standards such as those provided by ISO. This CLARIN Standards Information System is continuously updated via input from the CLARIN Standards Committee. 

Watch the recording of the presentations: Using a Distributed Technical Infrastructure and The CLARIN Standards Information System (SIS) on the CLARIN YouTube channel.

Standards, tools, community and embedding into national infrastructures are necessary prerequisites for sustainability. Of equal importance is the organisational sustainability. This aspect was addressed by Andreas Witt: the ERIC legal framework provides a stable mechanism for sustained cooperation with its national members.  These members, i.e., national consortia, ensure the integration and development of services and data for this federated infrastructure and act as contact points for the CLARIN ERIC board. Such a sustainable national node requires long-term institutional support that goes beyond temporary project funding. Andreas Witt presented how the association ‘Geistes- und kulturwissenschaftliche Forschungsinfrastrukturen e.V.’ takes up this challenge.

Watch the recording of the presentation: Sustainability by Forming a National Legal Entity and the Q&A of the Café on the CLARIN YouTube channel.

Additional information on this CLARIN Café is available on the event page.