The CLARIN Bazaar

The following stalls have signed up for the bazaar. Please come along and talk to the stall-holders. An outline list of stall is also available in a PDF file.

The Bazaar is an experimental format which offers possibilities for networking and the presentation of work in progress.


Number Name of stall-holder(s): Country: Title of stall: Purpose Description
1 Twan Goosen; Thomas Eckart; Matej Durco; Davor Ostojic All CLARIN countries ( ) developments Work in progress, 

Software demo
A showcase of developments in the Virtual Language Observatory ( in various stages of completion with respect to usability, extended search facilities and metadata curation. Apart from providing an opportunity to preview changes in the VLO, we hope to discuss the presented changes as well as other potential improvements with users, (meta)data providers en metadata experts.
2 Menzo Windhouwer Netherlands first aid kit Sharing experiences,

Software demo
In this stall we answer all your CMDI questions and can help you with problems you encounter. Also we can point you to lesser know tools in the CMDI toolbox, which may make your life easier and help improve the quality of your metadata!

Audience: metadata modellers, creators, providers and curators

3 IDS Mannheim, UFAL Prague Germany License Selector Software demo We want to officially launch our License Selector - a software tool that allows researchers to quickly choose an appropriate public license for they data and software.
4 Jan Odijk Netherlands Can we extend Alpino with flexible MWEs? An unexpected possible use of PaQu Work in progress,

Software demo
I am investigating whether and how the Dutch Alpino parser can be extended to deal appropriately with flexible Multiword Expressions. Though at first I thought this would have to start with a paper exercise, I realized I can actually use PaQu to test out some initial ideas. I created a minitreebank of sentences containing relevant Multiword expressions with the help of PaQu and can now systematically investigate what is needed for such an extension and how Alpino should be adapted. I will explain the idea, and show how Paqu aids me in this investigation.
5 LINDAT/CLARIN Czech Republic Promoting and Linking Data and Services Work in progress,

Sharing experiences,

Looking for collaborators,

Software demo
We show the progress and discuss experiences with linking records for datasets and tools and services in the Clarin repository with the instances of the data available via the services: corpus search tools, data processing tools, dictionary interfaces, etc.

We also want to show ways we encourage sharing these items and discuss progress and future plans for interlinking the interfaces and tools more tightly.

We look for collaborators in this endeavour of enhancing popular tools like Clarin repositories, Kontext corpus manager, and online dictionary interfaces to provide tighter integration of using (and reusing) data online and offline.
6 Dirk Roorda Netherlands Hebrew Bible: System of Data and Annotations Work in progress,

Sharing experiences,

Software demo
We have made SHEBANQ, a researchers' interface to the text of the Hebrew Bible.

A lot of background information is available at

We think we have a few good practices to share with people that design rich interfaces for text resources in the humanities.

The point we want to make is, that treating text as data has turned out to be a good practice.

We have put the text in a database, and all the rest is stand-off markup (LAF = Linguistic Annotation Framework). This is a great help in accommodating data from different sources and integrating new data in a controlled way.

SHEBANQ users can contribute content by saving queries or add manual and bulk annotations. They can instantly visualize results. Even better, they can do research and share results in various ways.

And, best of all, other parties can use the same data and build quite different tools for it. There is the Danish Bible Online Learner, with which SHEBANQ interlinks, and there is a new datamining tool underway from an research team in Israel.
7 KTH Speech, Music and Hearing Sweden Hidden resources: speech and language data in cultural institutions, governmental organizations, and public archives Work in progress,

Sharing experiences 
We are currently taking inventory of existing speech (and language) resources in government and cultural institutes. The most striking impression is that there are veritable masses of virtually unused data lurking in the archives. On the positive side, the resource holders are overwhelmingly positive at the prospect of their data being put to use. On the negative side, the obstacles are many and diverse, from digitalization and logistic difficulties through funding to issues with copyright and integrity. If you want to hear more, or to exchange experiences, drop by!
8 Andrius Utka, Darius Amilevičius, Rita Butkienė Lithuania - the Infrastructure of NLP Tools and Language Resources for Lithuanian Work in progress is the project for creating the infrastructure of NLP tools and language resources for the Lithuanian language. Some of its services and resources are planned to be included into CLARIN-LT services. At the stall we will present the framework of the infrastructure and demonstrate different services that have been built upon it. Besides, we would like to present and discuss some issues and problems that such infrastructure entails. We expect to attract some new ideas and solutions for further development of the infrastructure.
9 Martin Wynne, Arjan van Hessen United Kingdom and Netherlands  Working with oral history archive data Looking for collaborators As part of the CLARIN-PLUS project, a workshop  on the topic of 'Working with Oral History Archives', will be held in the Spring of 2016. We aim to engage both language technologists with relevant skills, and tools, as well as content providers, and users of these resources. To start planning the programme for this workshop, we need to identify successful applications of language and speech technology in this domain. We need to make sure that we include all relevant expertise from within the CLARIN family, but good examples from outside are also relevant. The workshops should be a good way to bring us into contact leading practitioners into our activities. Relevant information can include:

- articles in journals and conference proceedings,

- projects,

- datasets,

- software,

- existing tutorial materials or teaching expertise,

- ways to communicate with relevant user groups.

Gunnar Eriksson, Jens Edlund


Food in text and speech


One of the key obstacles to overcome in our early work in SWE-CLARIN has been to get in touch with those elusive HS researchers that are in desperate need of speech and language but live in blissful ignorance of this fact. One of the approaches we are investigating is to arrange thematic workshops where the topic is broad and relevant to a great many fields and from many perspectives. A first attempt at this is planned for the spring of 2016, with "food" as the theme. We will place little or no restrictions on who can participate in addition to SWE-CLARIN members - as long as they are either a resource holder or a researcher with any kind of connection to food, they are welcome. Our hope is that by putting people with broad common interest - food - together with speech people for an open discussion, new and interesting ideas on collaborations will spawn. Feel free to drop by to talk about food in research!


Martin Matthiesen


Access Management at Kielipankki (The Language Bank of Finland)

Sharing experiences, Software demo

We have successfully been using REMS for ACA/RES resources access management since April 2015. Experiences so far have been positive, REMS integrates well in a SAML2-based environment, but can also be used to control unix groups in a more traditional Linux/shell based environment. Our setup makes extensive use of SAML2, PIDs, Licenses, Metadata and of course corpus viewers like Korp. One point of interest from our side would be to properly integrate it into MPI/TLA's LAT software.


Leif-Jöran Olsson and Oliver Schonefeld


Federated content search – ­bringing people to your resources

Work in progress, Software demo

As a content holder you can take part in the federated content search which augments your search engine to give a bigger visibility to your resources. A prototype shows a basic setup which makes the entry threshold low. We report on the ongoing extension.


Neeme Kahusk


software as CLARIN repository

Work in progress, 

Sharing experiences, 

Software demo

META-SHARE node at University of Tartu is equipped with SSO, metadata export and import, and a system of automatically registering DOIs at DataCit


Mitchell Seaton, Lene Offersgaard


Reading interface for the study of Greek & Latin texts

Work in progress,

Sharing experiences

Development of a new portal and reading/learning interface for the study of Greek & Latin texts, allowing for close reading and understanding of the texts with featured commentary. The work includes structuring and the format of data, with interaction with a web application.

15 Go Sugimoto Austria

Let's make the VLO better!

Looking for collaborators,  

Sharing experiences, 

Software demo, Work in progress

Are you interested in language resource discovery and access? Are you passionate about CLARIN? Then, please join us!

Austrian members would like to present and discuss the latest development and plan for Virtual Language Observatory (VLO). We have concrete proposals and demos for design and functionalities. We also welcome curation and developer volunteers to help us implement better VLO.

Target audience:

- Taskforce members (VLO, curation, CMDI, CCR etc)

- Anybody interested in resource discovery and access

- End-users

Please see our comprehensive VLO recommendations:

16 Margaret King Austria

ACDH Research Tools and Services

Looking for collaborators

The ACDH is the newest institute of the Austrian Academy of Sciences (OEAW) and is devoted entirely to the Digital Humanities both as a research institute and as a service centre. We invite you to come to our stall to check out our latest projects, tools and services, learn more about our institute and possibilities for collaboration.


Maciej Piasecki, Tomasz Walkowiak


WebSty - an open web-based stylometry system

Software demo,

Work in progress

- An Open Textometric and Stylometric System

-System designed for characteristic features of Polish

- Linking together language tools, feature extraction with frameworks for stylometry and clustering, e.g. Stylo (Eder & Rybicki)

- Enabling the use of features defined on any level of the linguistic structure: 

from the level of word forms up to the level of the semantic-pragmatic structures.

- Available as Web Application and a Web Service

Combing: a stylometry system aith a semantic classification and tagging system