Tour de CLARIN: CLARINO's involvement in the university curriculum

Blog post written by Koenraad de Smedt, edited by Darja Fišer, Elisa Gorgaini, and Jakob Lenardič

Over the past few years, the CLARINO Bergen Centre has made significant efforts to engage and train potential users of the CLARINO and the wider CLARIN infrastructure in the context of higher education. This includes both one-time teaching events and active involvement in regular university courses.

Staff of the CLARINO Bergen Centre have participated in several PhD researcher training courses, in particular those organized by the Norwegian Graduate Researcher School in Linguistics and Philology (LingPhil). One of these was a course at a LingPhil summer school in Northern Norway in 2016. The course was focused on data management and was spread out over a whole week, in daily two-hour blocks.  The course, given by Koenraad De Smedt and Gunn Inger Lyse Samdal, presented the creation and management of research data, repositories and standards, and documentation and metadata, as well as showcased how the data can be found and used. Examples based on resources, tools and recommendations from CLARINO and CLARIN were used throughout the course. For instance, CLARIN licenses and the CLARIN “laundry tag” system of categorizing licenses were explained. Examples from the Trolling repository were used to illustrate good citation practices and examples from the ASK corpus at the CLARINO Bergen Centre to illustrate principles and standards for annotation.

Teaching turned out to be highly interactive, with questions coming from the side of the lecturers as well as from the students. Besides the lectures, the course also featured a lot of individual and group exercises. Students practiced, for instance, how to choose a license type and how to write metadata. Since this course was part of the LingPhil summer school, data management was not seen as an isolated activity, but linked to other aspects of linguistic methodology.

The use of data and tools in general has in recent years also become well integrated in several regular courses of the bachelor’s and master’s programmes in Linguistics at the University of Bergen. An introductory course in language and computers demonstrates corpus search with the Corpuscle system, while syntax courses have used online parsing with XLE-Web and materials from treebanks in INESS, which is part of the CLARIN Knowledge Centre for Treebanking, which was featured in Tour de CLARIN as well. Both web-services are provided by CLARINO. Data from INESS have also been used in a master’s course on computational language models to show how empirical corpus data can strengthen or challenge hypotheses about grammar. It has turned out that students quickly learn to use the system in projects for term papers and master’s theses. Victoria Troland’s master’s thesis, for instance, used INESS to extract syntactic markers from syntactically analyzed Norwegian novels, and subsequently used these markers as a basis for an author identification model.

In our experience, students learn to use the infrastructure best when such use is well integrated in the regular curriculum. We therefore intend to introduce materials and methods based on CLARINO and CLARIN in even more courses. We will also repeat the PhD researcher training course in 2020, albeit in a more compact form.

Students at the LingPhil summer school 2016 in Northern Norway did group exercises under the supervision of Koenraad De Smedt and Gunn Inger Lyse Samdal

