You are here

The CLARIN Bazaar 2019

Below you will find a list of the stalls that can be visited at the CLARIN Bazaar. Please go and talk to stallholders, see their wares, and share your ideas!

Bazaar presentations | CLARIN Commitees and Initiatives


Bazaar presentations

Title of stall   Description

Archipelago DH: Computational Linguistics meets (more and more) Digital Humanities

Federico Boschetti

 

The Institute for Computational Linguistics “A. Zampolli” (CNR-ILC) recently activated a new research unit devoted to the collaboration with the Centre of Digital and Public Humanities - Department of Humanities, University Ca’ Foscari Venezia. In this stall we will present projects in Digital Textual Scholarship developed by the CNR-ILC, such as Euporia for linguistic, stylistic and thematic annotation, or by the Ca’ Foscari University, such as Musisque Deoque and Memorata Poetis, and we will illustrate our plans for the integration of the projects. (Poster)

     

RDM @Specialized Information Service Near Eastern-, North African and Islamic Studies

Daniel Brenn

 

The Specialized Information Service Near Eastern-, North African and Islamic Studies  aims to address this need by creating a dedicated research data management service as well as an information service to increase awareness for the topic as well as enabling an understanding of how and why this benefits researchers (as opposed to something simply required by funders). The MENA region studied in these fields poses some unique challenges: From the dozens of languages, some so small that there is no unified transliteration or no Unicode for the script, to the political difficulties of instable, diverse and politically difficult countries, there are myriads of things that need to be considered when working in these fields, increasing the things that need to be considered when compared to e.g. European History.

Both CLARIN and DARIAH – now growing closer as CLARIAH-DE – provide rich infrastructures of tools and repositories for the study of text materials as well as specialized data centres for the storage of sensitive data. The technological difficulties can thus be overcome rather easily, enabling researchers and RDM to concentrate on the specific problems of their specific projects.

     

Textual Analysis, Visualization and Epistemology: Presenting the Centre for Digital Humanities in Gothenburg

Centre for digital humanities, the University of Gothenburg

 

As a member of Swe-Clarin, the Centre for Digital Humanities (CDH) at the University of Gothenburg provides tools and expertise related to computational criticism in the study of literature as well as to visualizations and analyses of text resources and of cultural heritage. CDH as a whole also focuses on the critical investigation of the consequences of digitalization in a digital humanities context. The aim of the poster is to give an overview of the multidimensional research prole of CDH. We will present some of our projects and other work, highlighting the centre's focal areas: 1) digital textual analysis, 2) data visualization and publication, and 3) digital epistemology. The poster will emphasize how the centre situates itself in-between different digital research contexts.

     

King's Digital Lab: Research Software Engineering in Digital Humanities

Arianna Ciula

 

When dealing with Digital Humanities at scale and managing around 200 projects, the operational methods around the organisation of a research lab cannot be left to chance. Building on pioneering work for the last 40 years and sitting on a rich and complex estate of legacy projects, King's Digital Lab at the Faculty of Arts & Humanities, Kings' College London (UK) became operational in 2015 and has matured into a fully fledged Research Software Engineering laboratory with defined roles (team of 13 permanent staff), Software Development Lifecycle and technical stack. Our presence at a the CLARIN bazaar will be a chance to learn more about our model, exchange best practices, identify possible partnerships. 

     

Beyond WebAnno: The INCEpTION Text Annotation Platform

Richard Eckart de Castilho

 

INCEpTION is a next-generation text annotation platform currently being developed at the UKP Lab at the TU Darmstadt. It builds on the popular CLARIN WebAnno technology and introduces exiting new features including powerful annotation automation, support for RDF-based semantic resources, search capabilities, the ability to connect to external document repositories and an overall improved user experience. An easy upgrade path allows existing users of WebAnno to migrate to INCEpTION and to profit from its new functionalities.  Instead of forking the WebAnno codebase, we took the approach of modularizing WebAnno to allow both INCEpTION and WebAnno to be developed side-by-side. As such, in addition to driving innovation within the new annotation platform, the INCEpTION project is currently also acting as the main provider of maintenance to the CLARIN WebAnno community.

INCEpTION is being actively adopted by the community as can be seen from different use-cases presented on our homepage. Monica Berti's paper "Named Entity Annotation for Ancient Greek with INCEpTION" at CAC 2019 illustrates her use of our tool and represents one of our use-cases (https://inception-project.github.io/use-case-gallery/digital-athenaeus/).

Come to our stall to ...

    •    discuss your annotation needs and use-cases with us
    •    learn how to annotate your texts using INCEpTION or WebAnno
    •    learn about the differences between INCEpTION and WebAnno
    •    learn why to upgrade to INCEpTION and how to do it
    •    discuss interoperability with text annotation services in your infrastructure

Richard Eckart de Castilho is a PI on the INCEpTION project together with Iryna Gurevych. He has designed most of the original WebAnno architecture, is currently the maintainer of WebAnno, and also maintains several other NLP-related open-source projects (e.g. DKPro Core, uimaFIT) with a focus on interoperability and re-usabiltiy. (Poster)

     

Tools and Methods for Continuous Collaboration and Curation

Anne Ferger

 

In our stall we are going to present selected tools and workflows that were developed for the continuous collaboration and curation in the long-term research project INEL (Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages), that aims for the documentation and curation of language resources coming from less documented and in many cases endangered languages from the Northern Eurasian Area. The project is part of the Academies’ Programme, which is coordinated by the Union of the German Academies of Sciences and Humanities.

To ensure a continuous high quality level of research data over the entire 18-years-runtime of the project, a number of custom tools and workflows was implemented in the initial phase.

It thus became possible to adequately use (i. e. publish and analyze) the current state of the research data at any time using widely adopted standards and data formats. This (among other things) includes the creation of CMDI metadata and allows for delivering catalogue metadata (e.g. in the VLO) to disseminate the resources. (Poster)

     

Crowdmap the Crusades - ISACA Community Day Event at the University of Luxembourg

Emma Goodwin

 

Exploring motivation in crowdsourcing projects : Medieval French transcription competition featuring the citizen science project 'Crowdmap the Crusades'

ISACA LUXEMBOURG will be participating in ISACA’s first-ever worldwide day of volunteering, CommunITy Day, taking place on 5 October 2019. Using a live social media feed and #ISACACommunITyDay, members, staff and other engaged community members around the world will track local service projects making a difference in their communities within this 24-hour period from New Zealand to Hawaii. ISACA LUXEMBOURG CHAPTER is planning to participate by volunteering through ‘CROWDMAP THE CRUSADES’ TRANSCRIPTION PROJECT at the Maison du Savoir, Salle 3.380, Belval Campus, University of Luxembourg.

Kicking off in ISACA’s 50th anniversary year, CommunITy Day will take place annually on the first Saturday in October. With more than 220 chapters, 140,000 members, 200 staff and 460,000 engaged professionals, ISACA has a wide-reaching impact around the world in helping advance the positive potential of technology—and on ISACA CommunITy Day, the association plans to extend this impact to make a meaningful difference in wide-ranging ways in communities across the globe.  

ISACA LUXEMBOURG members will come together to volunteer in the local community in Esch-sur-Alzette by participating in a mini tutorial on medieval manuscript transcription followed by a competition with prizes for those correctly transcribing characters. 

Introduction and background

The competition will introduce people to a well-established academic digital humanities project run by Emma Goodwin, University of Oxford and Secretary, ISACA Luxembourg Chapter, and is based on a text in a thirteenth century French manuscript, Hatton 77. This text is variously known as the Song of the First Crusade, the Siège d’Antioche, the Estoire d’Antioche and La Chanson de la Première Croisade. The narrative follows the progress of the First Crusade from the Council of Clermont in November 1095 to the battle of Ascalon in August 1099. It includes a number of historical figures, including the Frankish knight, Godfrey of Bouillon, who has links to Luxembourg and the surrounding area as the Duke of Lower Lorraine from 1087, and went on to become the first ruler of the Kingdom of Jerusalem.

The text is preserved in two manuscripts and two fragments, however this competition will focus on the version in Hatton 77, which includes the oldest and most complete version of the text.

Objectives (academic and community)

  • Working on a culturally relevant text and fostering wider community engagement in citizen science.
  • Observing the efficacy of a “competition” in encouraging wider engagement in the project. Motivation is a known challenge in crowdsourcing projects, so it will be interesting to see how effective a competition is in increasing participation.
  • The event will be combined with some general awareness-raising of ISACA and issues around cybersecurity and online safety.

Aside from its cultural importance, until the entire text is available as an edition, it remains largely under-researched by historians and literary scholars. Join us and become a citizen scientist!

To learn more, to register to get involved in the ISACA Luxembourg Chapter volunteer project, or to track your participation in the activity, visit https://engage.isaca.org/communityday. (Poster)  

     

DELAD: How to share and access sensitive datasets with language and speech disorders via CLARIN​

Henk van den Heuvel

 

In CLARIN context, the DELAD task force (https://delad.ruhosting.nl) organised a workshop in Cork, November 2017, to set-up a plan to collect existing and new Corpora of Disordered Speech (CDS) and to include these in the CLARIN infrastructure.  A second workshop was organised in Utrecht, January 2019. Its goal was to review the status of the actions set out in the first workshop, exchange deeper insights on ethical and legal aspects of CDS collection and sharing against the background of the GDPR, and come up with a plan for primary special needs for the CLARIN infrastructure to host CDS. This is now taking shape in a CLARIN Knowledge Center for Atypical Communication Expertise with strong collaboration with the CLARIN Data centres at the MPI in Nijmegen and Talkbank at CMU. In the poster we present the outcomes of the workshop and the current state of affairs. (Poster)

See also: https://www.clarin.eu/blog/clarin-workshop-delad-database-enterprise-language-and-speech-disorders.

     

NLP for Historical Documents

Bryan Jurish, Martin Wynne

 

A workshop on NLP for Historical Documents was held on Berlin in early September 2019. The workshop brought together people who are creating or working with NLP tools (especially tokenizers, normalizers, morphological  analyzers, part of speech taggers and lemmatizers) for historical language varieties, especially European languages in the period 1500-1800.  This workshop focussed on the adaption of NLP tools trained on or designed for modern language varieties, as well as custom tools designed specifically for particular historical varieties. The outputs were a draft resource guide, models of workflows, and a set of recommendations for integrating tools into the CLARIN infrastructure.

     

Automatic Identification of Lithuanian Multi-word Expressions

Jolanta Kovalevskaitė, Tomas Krilavičius, Erika Rimkutė

 

The poster aims to present outcomes of the project „Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU)“ (research, funded 2016-2018 by a grant No. LIP-027/2016 from the Research Council of Lithuania) (http://mwe.lt/): 1) the DELFI.lt corpus; 2) an experimental tool Colloc for automatic MWE identification; 3) lexical database of Lithuanian MWEs (collocations and idioms); 4) Lithuanian Collocation Dictionary. 

     

Documenting Language Actors

Alexander König, Monica Pretti

 

We present the notion of “local language actors” (LLAs)—prototypically small-size institutions collecting, managing and hosting language data and/or services of historical and cultural value for their geographical area (in our case exemplified by South Tyrol, an autonomous province in northern Italy)—and argue that they can be regarded as an ecosystem at the local level, i.e. a regional infrastructure, which each establishment benefits from in terms of synergy exploitation. 

The project which this concept stems from is called DI-ÖSS (“Digital Infrastructure for the Ecosystem of South Tyrolean Language Data and Services”). It draws inspiration from two main sources, Latour's Actor-Network Theory and Deleuze et al.'s Assemblage Theory, which the adopted bottom-up approach and envisioned multiple-connection framework respectively derive from. 

In order to enable the surfacing of resource-based, competence-oriented and need-related cooperation opportunities, transparency is required when dealing with LLAs, thus making the development of a methodologically well-grounded and operationally functional documentation workflow crucial. We, therefore, propose a threefold process which has already been implemented to gather information about a reference scenario consisting of eleven South Tyrolean organizations. Firstly, pertinent LLAs were identified and documentation criteria defined. Secondly, data about the selected LLAs were collected and organized; thirdly, the information obtained is in the process of being formalized into CMDI profiles to allow for a future integration into CLARIN via the VLO software. 

By so doing, we intend to compile a comprehensive overview of each chosen establishment so as to build a wide-reaching, local-bound system which can offer follow-up opportunities and novel use cases while bringing European and regional infrastructures together. (Poster)

     

Enhancing European parliamentary data for Digital Humanities research: The role of CLARIN and a Hansard case study

Prof. Dr. Christian Mair 

 

Records of parliamentary debates are excellent material for Distant Reading (Moretti 2013) approaches in the Digital Humanities for several reasons. They are available in large quantities, usually also in digital format. They have (sometimes considerable) historical time-depth, making it possible to trace lexical, grammatical, stylistic and social change across time. Multilingual countries (e.g. Belgium) and the European Parliament provide transcripts of debates in two or more languages. In theory, we would expect such data to be studied intensively across the disciplinary spectrum of the humanities and social sciences – a potential which, however, has not been realised yet. Many currently available collections of digitised parliamentary debates are poorly annotated and allow only limited searches. Comparative research across languages and databases is further hindered by a great diversity of text and annotation formats.

As Fišer and Lenardič (2017) have shown, CLARIN initiatives in several countries have already added value to existing parliamentary data by providing easier access and better annotation and search options. On the other hand, they also note clear deficits in the way these resources have been “marketed” in the relevant academic research communities. Two years on, their vision of integrated and interoperable databases “readily accessible for researchers from different disciplines as well as for cross-border and cross-lingual projects” (p. 84) remains an attractive goal rather than a reality. 

Empirical case studies on the topics of “Brexit”, “Europe” and “immigration/emigration” in the UK Hansard (https://www.clarin.ac.uk/hansard-corpus) will help us define realistic goals for further CLARIN activities.

References:

  • Alexander, Marc, and Andrew Struan. 2017. “Digital Hansard: Politics and the uncivil.” In Digital Humanities 2017, Montréal, QC, Canada, 08-11 Aug 2017, 378-380. https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf
  • Baker, Helen, Vaclav Brezina, and Tony McEnery. 2017. “Ireland in British parliamentary debates, 1803-2005.” In Tanja Säily, Arja Nurmi, Minna Palander-Collin and Anita Auer, eds. Exploring future paths for historical sociolinguistics. Amsterdam: Benjamins. 83-108.
  • Fišer, Darja, and Jakob Lenardič. 2017. “Parliamentary corpora in the CLARIN infrastructure.” Selected papers from the CLARIN Annual Conference 2017, Budapest, 18 - 20 September 2017. Conference Proceedings published by Linköping University Electronic Press at www.ep.liu.se/ecp/contents.asp?issue=147.
  • Mollin, Sandra. 2007. “The Hansard hazard: Gauging the accuracy of British parliamentary transcripts.” Corpora 2: 187-210.
  • Moretti, Franco. 2013. Distant reading. London: Verso.
  • Truan, Naomi. 2016. “Parliamentary debates on Europe at the House of Commons (1998-2015) [Corpus].” ORTOLANG [Open Resources and TOols for LANguage]. https://hdl.handle.net/11403/uk-parl. (Poster)
     

Folk in Tuscany: the Caterina Bueno sound archive

Monica Monachini, Maria Francesca Stamuli, Silvia Calamai

 

Caterina Bueno (San Domenico di Fiesole, 2nd April 1943 – Firenze, 16th July 2007) was an Italian ethnomusicologist and singer. Her work as a researcher has been highly appreciated for its cultural value, as it allowed the collection of many Tuscan and central Italy’s folk songs that have been passed down orally from one generation to the next until the 20th century (when this century old tradition started to vanish). Her work as a singer has always been oriented towards research.

The daughter of a Spanish painter (Xavier Bueno) and a Swiss writer (Julia Chamorel), as a child Caterina was fascinated by Tuscan dialects, folk songs and peasant culture. At the age of twenty, she started travelling through the Tuscan countryside and villages recording Tuscan peasants, artisans, common men and women singing any kind of folk songs: lullabies, ottave (rhyming stanzas sung during improvised contrasts between poets), stornelli (monostrophic songs), narrative songs, social and political songs, and much more. These were the same songs that she sang in her performances, making them well-known and appreciated both in Italy and abroad in the second half of the 20th century, when she was at the pinnacle of her career. 

Caterina Bueno’s sound archive is composed of 476 carriers (audio reels and compact cassettes), corresponding to nearly 714 hours of recording and was digitised during the PAR-FAS project Gra.fo (Grammo-foni. Le soffitte della voce, UNISI  & SNS, http://sns.grafo.it). It was located at two different owners’: part of it was stored at Caterina’s heirs’ house, while the rest was kept by the former culture counsellor of the Municipality of San Marcello Pistoiese, in the Montagna Pistoiese, where a multi-media library was supposed to be set up. Unfortunately, disagreements and misunderstandings between the two parties have so far made the archive fragmented and inaccessible to the community. Both owners, independently, have turned to Silvia Calamai for the reassembly of the whole archive in the digital domain, in respect of the artist’s wishes. After digitising, the carriers were returned to their owners, who helped in finding an arrangement for the sound archive, which can be divided according to the following categories:

  • field-research (investigations carried out in the Tuscan countryside from the late 50s to the end of the artist’s life);
  • live performances (recordings of concerts and events);
  • performances’ rehearsals (recordings of rehearsals with musicians).

In 2019 Regione Toscana decided to support the project of cataloguing and disseminating Caterina Bueno Archive and the following partners were involved: Università degli Studi di Siena (Silvia Calamai), Soprintendenza Archivistica e Bibliografica della Toscana (Maria Francesca Stamuli), CLARIN-IT (Monica Monachini), and Unione dei comuni del Casentino (Pierangelo Bonazzoli). Archivio Vi.vo will thus constitute a pilot study within CLARIN-IT to experiment methods and offer services to disciplines interested in oral sources. The ILC4CLARIN Italian node offers archiving preservation access and tools for linguistic data of a written type; within Archivio Vi.vo. the repository will be improved through experimental approach to conservation, management and access to audio and audio-video data and metadata. Archivio Vi.Vo. will develop a model which can be replicated on other audio-visual archives, even outside the context of Tuscany. The experimental activity will aim to adopt the model and high-performance computing and archiving services of the new GARR network infrastructure, built along the Cloud paradigm. This model will be disseminated both to the scientific community interested in accessing these data, and to the general public who enjoy ethnomusical materials produced in the territory. (Poster)

     

Create your own corpus - with WebLicht

Julia Müller, Christian Mair

 

The demonstration introduces WebLicht (https://weblicht.sfs.uni-tuebingen.de/weblicht/), a web-based suite of corpus tools provided in the frame of the CLARIN-D and CLARIN-EU infrastructures. Weblicht enables users to compile state-of-the-art corpora from their own data, with customised annotation and output formats. The demo takes the audience through the basic steps of corpus-creation and annotation with Weblicht. It describes the available annotation types (PoS-tagging, lemmatization, morphology annotation, constituent and dependency parsing, and named entity recognition) and shows how to search the annotated corpus in order to find, for instance, specific parts of speech or syntactic structures.

Interested attendees are welcome to provide small samples of their own data. (Poster)

     

Ellipsis Errors in Writing Student: A Language Resource for an Automatic Detection and Correction Tool

Laura Noreskal 

 

Our study is part of a larger social science research project with educational purpose on student writing: écri+. The aim of our research is to set up an assessment, courses and certification system to improve the written expression and comprehension of French students. Our contribution in this project is to develop automatic tools for the detection and the correction of errors, among them errors in Ellipsis constructions. 

At this early stage, the issue lies in how we build our corpus in order to choose the most appropriate Natural Language Processing (NLP) method for detecting and correcting faulty Ellipsis in these specific writings. While previous research on Ellipsis detection and resolution (among others Nielsen, 2005; Bos & Spenader, 2011; Gandón-Chapela, 2017) have applied on English annotated data using the British National Corpus (BNC) and the Wall Street Journal (WSJ), none of them addressed a French dataset. Moreover, actually no existing corpus corresponds to what we are dealing with, namely errors in student writing. 

Thereby, our first task is now to constitute a language resource on Ellipsis errors, not only to test and find the adequate NLP treatment but also to find out which type of Ellipsis is subject to errors. We have collected currently 164 errors in Ellipsis constructions in different student writing (exams, homework, internship reports…). At the end of this first step, we aim to constitute a corpus of about 250-300 faulty Ellipsis constructions, which we consider to be a linguistically representative size, enabling us to test symbolic and machine learning methods such as Support Vector Machine (SVM) or artificial neural network. 

Our goal is thus to present the key steps in the constitution of our corpus by explaining the methodology adopted and some first analyzes that we have done

     

Oral History Interviews as Research Data - Digital Interview Collections at CeDiS

Cord Pagenstecher

 

The Center for Digital Systems (CeDiS) at Freie Universität Berlin creates and curates digital interview collections. Since 2006, several major collections with audiovisual testimonies have been made accessible for research, teaching and education. Besides focusing on World War II and Nazi atrocities, some new topics are the Cold War, Latin America or university history. The interview archives contain corpora of multimodal, multilingual, informal spoken texts, encouraging interdisciplinary research by oral historians and linguists. Curating narrative interviews as audiovisual research data, the CeDiS team addresses various challenges, ranging from speech recognition and audio-mining to cross-collection search, privacy concerns and long-term preservation.

     

The OLKi platform for federated resource sharing and communication

Pierre-Antoine Rault, Christophe Cerisara

 

The vision of OLKi is a federation for scientists, i.e. a decentralized network composed of many University servers, each one managing its own community.

As it implements the W3C ActivityPub standard, this scientific network will also be part of the Fediverse, a community-managed multimodal social network that hosts 2 million citizens, hence enabling direct interactions between scientists and citizens.

Two services will initially be deployed on OLKi:

  1. Resource sharing: every user may upload (under the control of the hosting node) and share resources (datasets, papers, programs, models, videos...), or import them from data repositories (such as ORTOLANG, Zenodo, CLARIN, Dataverse, arXiv, HAL...) with OAIPMH. Note that OLKi does not offer persistent storage, and thus is not a data repository. OLKi is complementary from data repositories: it makes their metadata visible on a global decentralized social network to scientists and citizens; and it handles sharing short-term resources.
  2. Instantaneous scientific communication, attached to resources: beyond traditional conferences, journals and emails, some scientists have expressed the need to communicate more quickly on social media (Twitter, Reddit, Researchgate, Academia...). OLKi provides all the tools for that (including math rendering, referencing resources...) over a global federated social network already used by citizens, while keeping all of their data under their control on their local node.

Facilities to easily implement and deploy new services over the OLKi platform will be provided, for instance for federated deep learning solutions. (Poster)

     

Reproducible open science with big data based on EGI and EOSC services

Gergely Sipos, Enol Fernandez, Philipp Wieder, Miroslav Ruda

 

This stall displays a practical and realistic implementation of the open science cycle by combining services from the EGI federation, from the European Open Science Cloud (EOSC) and from the broader open science domain. The presentation combines EGI Notebooks, EGI DataHub and EGI Binder services with GitHub and Zenodo to enable scientific communities analyse and visualise big data in a reproducible way:

  • EGI Notebooks is a scaleable Jupyter Notebooks service hosted on the pan-European EGI Cloud provided by academic and commercial providers for science and education. EGI notebooks can be used for data parsing, analysis and visualisation. A centrally operated instance of the EGI Notebooks service is operated by EGI for the ‘long tail of science’. Additional and customised instances can be requested by scientific communities on demand from EGI. Such instances can have for example fat VMs, GPUs, special libraries and file systems underneath.
  • EGI DataHub is a data access platform that provides access to distributed datasets from various repositories and can cache them to compute sites for data analytics. The EGI DataHub deployment hosts a growing set of open datasets from various scientific communities and makes those accessible from EGI Notebooks. Scientific communities can deploy DataHub on their own premises to expose existing datasets to users at remote compute resources. DataHub presents and organises such data as a distributed file system.
  • GitHub is a public service on the internet, widely used to version, store and share software code. GitHub complements the cycle by providing a platform to store versioned Notebooks together with the requirements concerning their executability (for example library dependencies)
  • Zenodo [4] is a scientific data and publication repository from OpenAIRE. Zenodo is used in the cycle to generate Digital Object Identifiers (DOIs) with permanent, shareable links that point to versioned Notebooks.
  • EGI Binder is a pilot service from EGI that accepts pre-defined Notebook applications and makes them re-executable with 'one-click' by anyone. Binder completes the cycle by enabling researchers to re-execute each others’ Notebook applications referenced by their DOIs. Binder generates application containers from the linked software on the fly, and starts those containers on scaleable cloud resources which are attached to EGI DataHub. By tapping into the EGI Cloud and EGI DataHub those containerised applications can re-run data analysis and visualization that have captured by the original notebook authors.

The presented cycle covers the steps of setting up a big data analysis, executing it on a scaleable cloud compute platform, sharing the analysis application in a permanently identifiable way (and via e.g. a scientific publication), discovering the analysis code and re-executing it with minimal effort.

     

Media Ecology Project (MEP)

Prof. Mark Williams (Dartmouth College, USA)

 

The Media Ecology Project (MEP) is a digital resource at Dartmouth that enables researchers across disciplines to access moving image collections online for scholarly use. MEP promotes the study of archival moving image collections, enhances discovery of relevant corpora within these archives, and develops new, cross-disciplinary research tools and methods. MEP thus helps ensure the survival of endangered moving image collections via new published scholarship plus contributions of metadata and research on studied corpora back to the archival community. 

The virtuous cycle of access, research, and preservation that MEP realizes is built upon a foundation of technological advance (software development) plus large-scale partnership networks that result in new practical applications of digital tools. MEP is fundamentally a sustainability project that develops literacies of moving image and visual culture history. It functions as a collaborative incubator that fosters new research questions and methods ranging from traditional Arts and Humanities close-textual analysis to computational distant reading.

With support from the National Endowment for the Humanities, MEP has developed several digital tools that support and sustain the creation of new networked scholarship and pedagogy about archival moving image materials. These include:

  • The Semantic Annotation Tool (SAT), which enables the creation of time-based annotations for specific geometric regions of the motion picture frame.
  • Onomy.org, which is a vocabulary-building tool that helps to grow and refine shared vocabularies for tags applied to time-based annotations. Among its vocabularies, Onomy will host an international dictionary of film terms, beginning with English-to-Mandarin. 

Together, these two tools support close textual analysis of moving pictures based on time-based annotations. Annotations denote a start time and stop time for a subclip, a description and tags related to that clip, and attribution for its creator. This granular approach to media literacy and scholarly annotation is flexible enough to be applied to many types of study.  For example, two current NEH-funded projects use these tools to study visual culture patterns in early film history (e.g., emerging performance styles) and TV newsfilm coverage of the civil rights era in the US.

At the other end of the methodological spectrum, MEP’s work with computer scientists has produced new tools supporting machine-reading of moving images. One direction of this research produces feature extraction (isolating specific formal and aesthetic features of moving images), while another uses deep learning approaches employing convolutional neural networks to identify objects and actions in motion pictures. Data from these tools can be merged with the “manual” (human-produced) annotation tools mentioned above to create synthetic and iterative research workflows across the disciplines. SAT enables real-time playback of all annotations.                       

New research questions in relation to these workflows will literally transform the value of media archives and support the development of inter-disciplinary research and curricular goals (e.g., media literacy) regarding the study of visual culture history and its legacies in the 21st century. (Poster)

     

The Impact of PARTHENOS

Frank Uiterwaal (PARTHENOS)

 

The PARTHENOS project empowers digital research in the fields of history, language studies, cultural heritage, archaeology, and related fields across the digital humanities (DH), through a thematic cluster of European Research Infrastructures. After 4,5 years, the PARTHENOS project - in which CLARIN has also played a very important role - is coming to a close at the end of October. At our stall at the bazaar, we will showcase the products and services that we have developed over the years and demonstrate in what way we believe that they will have a lasting impact. Examples of PARTHENOS' output are the Training Suite for people involved in DH research and Research Infrastructure (RI) management, the Standardization Survival Kit which supports those who are wishing to learn about standards in humanities research and the PARTHENOS policy guidelines leaflet which offers information on research data management.

     

CLARIN Committees and Initiatives

Title of stall   Description

CLARIN Standards Committee

Piotr Bański, Leif-Jöran Olsson, Dieter Van Uytvanck on behalf of the Standards Committee

 

The role of the Standards Committee, in a nutshell, is to gather and publish the information on the use of standards in CLARIN, and advise the Board of Directors on all standards-related matters. We have several initiatives completed or under way, and the 2019 Bazaar stall addresses the most sought-after and at the same time the most difficult of them, namely the creation and maintenance of a unified list of standards in use among the CLARIN (mostly B-) centres. Our poster outlines a synergistic path out of the present state of affairs and mentions some of the difficulties that still need to be overcome. (Poster)

     

Meet the User Involvement Committee

Darja Fišer, Jakob Lenardič

 

Visit the stall of the newly established CLARIN ERIC User Involvement Committee to find out about its tasks, responsibilities and members. We will show our results achieved in 2019 and discuss plans for 2020.

     

CLARIN Ambassadors

Francesca Frontini, Maciej Maryl, Toine Pieters

 

The aim of the CLARIN Ambassadorship programme is to raise awareness about and encourage participation in CLARIN ERIC in disciplines and communities that are not yet fully integrated in CLARIN. The initiative was launched in May 2019 and our first Ambassadors will present their first activities and discuss their achievements, findings and ideas on how to best support the communities they have interacted with.

     

Let's CLIC together!

Aleksei Kelli, Penny Labropoulou, Pawel Kamocki

 

Would you like to find out what CLIC does or what it can do for you? Would you like to get involved and contribute to CLIC work? Come and talk with us! We'll be happy to show you around the work done by the "Committee for Legal and Ethical Issues", discuss your ideas and needs, and make new plans ahead with you. 

CLICers' tasks include the study and publication of analyses related to legal and ethical issues (copyright, licensing, privacy, data protection, ethics, etc.), organization of and participation at awareness and training events, collection and dissemination of related material, etc. 

As a highlight of our work, you can check out our publications from previous CLARIN Annual Conferences, the CLIC White papers series, and the Legal Information Platform. We will also present our experience from the exciting workshop we had this year in Vilnius on Hacking the GDPR to Conduct Research with Language Resources in Digital Humanities and Social Sciences, where we (legal experts, digital humanists and language technology experts) sat down together and played around with real use-cases in order to see how GDPR affects research based on Language Resources and the particular legal and technological measures that can and/or should be deployed to ensure legal access to and processing of personal data under the GDPR regime. And more information coming from the two workshops we co-organized this year: Towards a Data Portal with DELAD and the DH2019 pre-conference workshop Copyright and humanities research: A global perspective with ELDAH. (Poster)

     

Knowledge Sharing Infrastructure (KSI)

Steven Krauwer (on behalf of the CLARIN Knowledge Sharing Infrastructure Committee)

 

All players in CLARIN, whatever their role is, need knowledge and expertise to do their jobs. Infrastructures such as CLARIN, as well as its users, are distributed all over Europe (and even beyond), in dozens or maybe even hundreds of locations. The mission of the CLARIN Knowledge Sharing infrastructure is to ensure that the available knowledge and expertise does not exist as a fragmented collection of unconnected bits and pieces, but is made accessible in an organized way to the CLARIN community and to the Social Sciences and Humanities research community at large.

Find out more about CLARIN's Knowledge Sharing infrastructure, the CLARIN Knowledge Centres, the possibility to apply for a CLARIN mobility grant and more at the KSI stall (Poster)

     

CMDI Task Force: Infrastructure – Resources – Use Cases

Andreas Nolda (on behalf of the CMDI task force)

 

At this bazaar stall, the CMDI task force is available to answer any questions you have about the use of component metadata in CLARIN. Also, we are glad to discuss the CMDI Best Practices Guide, which is under continuous development in close cooperation with the Metadata Curation task force. We appreciate any feedback or suggestions in this regard. Last, but not least, you will be able to fill in an online questionnaire on common use cases for metadata description in CLARIN. Your input will help to determine typical usage scenarios and specific need for support. (Poster)

     

B Center Assessments - Checklist and Proces - with a FAIR view

Lene Offersgaard, Jozef Misutka, CLARIN Assessment Committee

 

The CLARIN Assessment Committee is available for feedback, questions and discussions about the asessment process and the new FAIRified version of the checklist.

     

DH Course Registy API​

Hendrik Schmeer, Tanja Wissik

 

The DH Course Registry, a joint effort of DARIAH-EU and CLARIN ERIC, designed to showcase DH related teaching activities, is evolving. We will present the fully featured data API of the DH Course Registry. The API provides various query scenarios and filters. Furthermore, it enables research on historical data ingested into the registry since its start five years ago, which was not possible before with the statistics available on the website. These statistics provide only information on active training activities. The metadata model and overall documentation complies with the OpenAPI 3.0 standard. Come to our stall and find out more about our data and the new API. (Poster)

     

Goals for CLARIN participation in SSHOC

CLARIN@SSHOC

 

SSHOC is a cluster project bringing together 5 ERICs - CESSDA, CLARIN, DARIAH, ESS, and SHARE.
CLARIN will bring to SSHOC its expertise in Language Technology, and some of its core technical infrastructure components. Generalizing and adapting our services for processing data from the social sciences and cultural heritage where needed. The SSH will profit from this as CLARIN will from the assets and expertise of the other SSHOC stakeholder infrastructures. Together we will also assure the position of the SSH in the new to develop EOSC. (Poster)

     

For general information on the Conference, see the event page.