Tour de CLARIN: Portugal

Submitted by Karina Berger on 13 August 2021

Written by João Silva

The PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language is the CLARIN national consortium for Portugal, a country that has been involved in CLARIN since the European EU‑funded preparatory project for CLARIN began in 2008, and was invited in 2010 to be a founding member of what was to become CLARIN . The PORTULAN CLARIN consortium is currently formed by over 20 partners from various Portuguese institutions and fields related to language. Also included in the consortium are Camões I.P., the official national organisation responsible for the promotion of the Portuguese language, and four partners from Brazil. The Director General as well as the National Coordinator of PORTULAN CLARIN is António Branco.

Scientific knowledge is grounded on falsifiable predictions and thus its credibility and raison d’être rely on the possibility of repeating experiments and getting similar results as originally obtained and reported. Scientific knowledge is also cumulative, with more recent advancements originating from developments obtained from previous breakthroughs. Crucial for the scientific endeavour, and for science-based activities, is the availability of data and of companion analytical devices. Also crucial is the moral, and often physical, courage to challenge the status quo, together with the altruism to share the research results. Seeking to foster this scientific ethos, this research infrastructure is named 'portulan', the term based on that of portolan charts, designating the maps where discoveries by courageous sailors were documented such that these discoveries and associated data could subsequently be confirmed or corrected, the original journey could be repeated with increased efficiency, and new discoveries and routes could be reached beyond those already known.
Portolan chart by Jorge de Aguiar (1492), the oldest known signed chart of Portuguese origin (Beinecke Rare Book and Manuscript Library, Yale University, New Haven.
The mission of PORTULAN CLARIN is to support researchers, innovators, citizen scientists, students, language professionals and users in general whose activities rely on research results from the science and technology of language by means of the distribution of scientific resources, the supplying of technological support, the provision of consultancy, and the fostering of scientific dissemination. It supports activities in all scientific and cultural domains with special relevance to those that are more directly concerned with language – whether as their immediate subject, or as an instrumental means to address their topics – including among others, the areas of humanities, arts and social sciences, artificial intelligence, computation and cognitive sciences, healthcare, language teaching and promotion, cultural creativity, cultural heritage, etc. It serves all those whose activity requires the handling and exploration of language resources, including language data and services, in all sorts of modalities, in all types of representations, and in all types of functions.

The infrastructure pursues its mission by seeking to bring scientific, professional or personal advantages to its users by means of its operation being primarily centred on its users. Users were involved with the infrastructure well before it entered into operation, right from the start of its planning, and contributed to its design and implementation through the Network of Implementation Partners. Users have a Helpdesk to contact when they need support in their use of the infrastructure, benefit from initiatives to enhance their engagement with the infrastructure, and can address the infrastructure at any moment and are encouraged to provide advice through the Scientific Advisory Forum.

Users have free access to the infrastructure, with no registration required and without any 'members only' constraints. Through its repository, PORTULAN CLARIN allows users to access resources for working with language. These resources include language processing tools and applications, as well as corpora, lexicons and various other kinds of data sets, such as word embeddings and records of brain potentials during reading. In addition, PORTULAN provides an online Workbench, through which users can run a variety of language processing tools. These include, among others, a corpus concordance (CINTIL Concordancer); tools for language processing, such as dependency parsing (LX‑DepParser), named entity recognition (LX‑NER), and sub-syntactic annotation (LX‑Suite); as well as tools for additional textual enrichment, such as the analysis of temporal relations and events in texts (LX‑TimeAnalyzer).

Users distributing resources through the infrastructure are free to choose the distribution licenses for their resources and grant PORTULAN CLARIN only the non‑exclusive right to distribute those resources: users keep all rights, including the right to distribute their resources through other means, and to withdraw their resources from the infrastructure. No user data is retained related to their usage of the infrastructure, be it their scientific or personal data.

PORTULAN CLARIN ensures the preservation and fostering of the scientific heritage regarding the Portuguese language, supporting the preservation, promotion, distribution, sharing and reuse of language resources for this language, including text collections, lexicons, processing tools, etc. It represents an asset of utmost importance for the technological development of the Portuguese language and to its preparation for the digital age.