Skip to main content

Metadata Categories - Process

22.2.2009 - Peter Wittenburg, Iris Vogel, Elina Desipri

Recently a metadata expert meeting was organized in Athens to start with the specification of data categories that will be required to describe the various language resources. Basis for the meeting was what has been expressed in the document CLARIN-EB-5/2008 "Metadata Infrastructure for Language Resources and Technology" which can be found on the CLARIN web-site under documents.

 

Members of this group currently are: Viktoria Arranz, Nuria Bell, Daan Broeder, Thierry Declerck, Elina Desipri, Bertrand Gaiffe, Maria Gavrilidou, Erhard Hinrichs, Jean-Claude Martin, Nelleke Oostdijk, Wim Peters, Florian Schiel, Iris Vogel, Peter Wittenburg.

 

A first list of categories was selected for the following resource types: media, annotations, texts, lexica and lists. This document was created in the following way:

  • for each resource type independent of the others the required elements were discussed
  • elements that have already been mentioned were not per se repeated, since the primary goal of the meeting was to determine the scope of elements required, but not to create a useful component structure
  • no abstraction was made across resource types - only in the final spreadsheet all elements were brought together to indicate semantic overlap
  • structural blocks were only used for the structure the discussion, but are not meant as suggestions for components
  • special care was taken to achieve semantic compliance with the existing created metadata descriptions ( ,

The spreadsheet "Data Categories" includes all determined elements so far. First simple definitions were added so that others understand what is meant. There are also the corresponding IMDI, and OLAC elements. This spreadsheet is now open for comments within CLARIN.

The following steps were agreed:

  •  investigations are made to understand what kind of header elements are used by the various institutions (first lists were received)
  • the results of the discussions of the expert group meeting should be cleaned up (see spreadsheet) and be discussed at the Oxford meeting
  • special attention should be put on metadata descriptions for tools/web services and schemas in the coming weeks, i.e. first proposals should be worked out and suggested
  • the following special groups

    [1]

    were formed to work on the definitions and vocabularies:
    •  media: Daan, Peter, Jean-Claude, Florian, Martin, Sven
    • annotations: Erhard, Daan, Thierry, Bertrand, Viktoria, Dieter
    • texts - lists: Maria, Bertrand, Viktoria, Nelleke, Elina, Iris, IDS, Martin, Remco
    • lexica: Wim, Maria, Elina, Erhard, Peter, ILC, BBAW
    • Ontologies: Erhard, Wim (not yet clear whether CLARIN should take care of this)
    • Tools: Peter, Thierry, Daan, Nuria, Marc, Volker, Fabienne, Thomas, USFD
    • Schemas: Dieter, Peter, Daan
  • sample components should be created based on the selected elements

The final task is to produce element definitions of high quality so that they can be input into the ISOcat registry and get approved by the ISOcat boards.

If other CLARIN members are interested to participate in these groups, they should send an email to Peter.