You are here

CMDI 1.2 for software developers

CMDI 1.2

Information for software developers

This page is aimed at those who work on (or are responsible for) software that process and/or produce CMDI records and/or profile schemata or specifications.

What is CMDI 1.2?

See the page on CMDI 1.2 for general information about CMDI 1.2.

How does CMDI 1.2 affect me?

The degree to which your software is affected by CMDI 1.2 depends a lot on the exact functionality it provides and in particular whether it processes OR produces CMDI and whether it uses CMDI as a core format for storing or processing metadata or just as one of multiple representations derived from some other format.

The most basic case, and most simple to deal with, is an application or service that produces CMDI based on some other (core or internal) type of metadata representation. In such a case there is no immediate need for action as CMDI 1.1 is still supported by the core infrastructure as well as most other tools in the ecosystem. Also, in this type of case, adding basic support for CMDI 1.2 will be relatively easy or even trivial depending on the way the CMDI is generated.

The degree of impact can be a bit larger in case your application processes more or less arbitrary CMDI, in which case it may have to deal with incoming CMDI 1.2 records. Your software should at least distinguish between incoming CMDI 1.1 and CMDI 1.2 records (see "Should I take action immediately?"). You may choose to only support submission of one version of CMDI and reject the other, or only one - for example, accept CMDI 1.1 and convert it to CMDI 1.2 before further processing if you prefer to store only records that adhere to a single CMDI version.

Should I take action immediately?

If your software produces CMDI but does not process incoming CMDI, there is no strict need for immediate action. If your software processes more or less arbitrary CMDI metadata, it is advised to at least implement CMD version detection to deal with incoming CMDI 1.2 metadata in a more or less graceful way. This can easily be done on basis of the 'CMDVersion' attribute on the root node of a record. Also, the namespaces of the common CMDI "envelope" is different: CMDI 1.1 has http://www.clarin.eu/cmd/, while CMDI 1.2 uses http://www.clarin.eu/cmd/1.

How can I switch to CMDI 1.2?

The method of adapting your software to CMDI 1.2 depends on the kind of operation(s) it supports with respect to CMDI. The following sections describe the options for introducing CMDI 1.2 support for different types of functionality, of which one or more may apply to your software. If you choose to adapt your software to CMDI 1.2, please consult the CMDI 1.2 specification after reading the relevant section(s) below for full details and useful XML examples.

Production of CMDI through conversion/transformation

If your software currently produces CMDI by converting (by whatever means) data stored in another form of representation to CMDI, adapting that process to (include) CMDI 1.2 should be straightforward in most cases. Here we will consider two separate paradigms for producing CMDI: XML transformation and programmatic generation using XML marshalling. Even if your software does not precisely match one of these scenarios, you may still find handles here to design your own solution.

XML transformation is very suitable for producing CMDI if the metadata is already available in another XML based format. If you already have a transformation pipeline set up, it makes sense to utilise this for the conversion to CMDI 1.2 as well. There are two ways of doing so. The first way, which is the easiest, is to apply an additional transformation step, using the "CMDI 1.1 to 1.2 record upgrade" stylesheet from the CMDI toolkit. Applying this to a valid CMDI 1.1 document will take care of all required changes. It has a number of parameters, which you may want to have a look at. Notice that there are some (rare) cases in which CMDI 1.1 records cannot be transformed to CMDI 1.2 automatically, in which case the stylesheet will generate an error message and terminate. The second way, which requires more effort but gives more control and possibly better performance, is transforming directly to CMDI 1.2 from your 'source' format. For this approach you can probably largely reuse your existing stylesheet(s) for producing CMDI 1.1. Assuming you will use the same profile for the CMDI 1.2 versions of your records (which by no means is required, you can also use or create a profile, in particular if you would like to make use of one or more of the new features), you will have to make the following adaptations:

  • Use the http://www.clarin.eu/cmd/1 (suggested prefix: cmd) namespace for the envelope, starting at the root element CMD.
  • Use the profile specific namespace http://www.clarin.eu/cmd/1/profiles/{profileId} (suggested prefix: cmdp) namespace for the profile specific elements, starting at the child of the "cmd:Components" element.
  • For resource proxy references, use the "cmd:ref" attribute, which replaces the no-namespace "ref" attribute from CMDI 1.1. Make sure to have a look at the CMDI 1.2 version of the schema for the profile(s) that you are using (a link can be obtained via the component registry's UI), the specification and some of the examples to verify that you are generating valid CMDI 1.2 metadata.

XML marshalling, i.e. serialising a programmatic data structure into a pre-defined XML format, is useful when CMDI for one or more specific profiles is generated from other representations, such as databases, via in-memory structures. Various frameworks, such as JAX-B for Java, PyXB for Python and Jsonix for JavaScript, provide ways of mapping "native" data structures to predefined XML formats (and vice versa). These frameworks generally allow the developer to derive a mapping from a schema, which, applied to CMDI, makes it easy to create metadata records without having to explicitly generate XML. The Component Registry offers schemata for both CMDI 1.1 and CMDI 1.2 for each profile, which do not have overlapping namespaces, and therefore mappings can easily be generated for both versions of CMDI within one application. Keep in mind that, apart from the namespace differences, there are some differences between the document structures of CMDI 1.1 and CMDI 1.2 records, which will obviously be reflected in the generated object hierarchy.

Processing incoming CMDI

If your application or service allows users to provide CMDI from disk or from the web as input, either completely arbitrary or adhering to a predefined CMDI profile or set of CMDI profiles, you will have to consider the fact that users might offer either CMDI 1.1 or CMDI 1.2 metadata. You can choose to explicitly deny one of these versions, which you can enforce on basis of the namespace (see below) and/or the 'CMDVersion' attribute of the root node. Alternatively, you can incorporate conversion of CMDI 1.1 metadata to CMDI 1.2 (which can be done in a straightforward way using the CMDI toolkit) into your pipeline and process CMDI 1.2 from that point on. If your software allows for the manipulation of metadata records (i.e. acts as an editor), it may make more sense to maintain parallel processing pipelines for CMDI 1.1 and CMDI 1.2, or implement conversion from CMDI 1.2 to CMDI 1.1 for the profile(s) that you support - a generic method for downgrading records is not provided by the toolkit. See below for some information on manipulating CMDI 1.2 documents.

Here are some general instructions for adapting your application for processing CMDI 1.2 metadata (which may occur in addition to processing CMDI 1.1, or take place after a conversion step). The first thing to account for are the namespaces. While a single namespace (http://www.clarin.eu/cmd/) was used for all CMDI 1.1 metadata, CMDI 1.2 has one common namespace (http://www.clarin.eu/cmd/1) for the envelope (the 'CMD' root element, the 'Header' and 'Resources' section and the 'Components' element) and a number of 'general' attributes ('ref', 'ComponentId' at the component level and 'ValueConceptLink' at the element level). Everything below the 'Components' element should be in the profile specific namespace, which is defined as the concatenation of the base URI http://www.clarin.eu/cmd/1/profiles/ followed by the profile's ID from the Component Registry, e.g. http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1203456789123. If your application is not namespace aware, either add the logic required to deal with these namespaces or ignore namespaces while parsing CMDI. Be aware that you are more likely to encounter namespace prefixes with CMDI 1.2 than with its predecessor due to the usage of multiple namespaces. Suggested (but not mandatory, so don't rely on these!) prefixes are 'cmd' for the general and 'cmdp' for the profile specific namespaces.

Other changes that may need to be reflected in processing of CMDI by your software are:

  • The "ref" attribute on resources for referencing resource proxies is in the http://www.clarin.eu/cmd/1 namespace (previously it had no namespace) and is of type "idref", which means its value has to be exactly one existing resource proxy ID.
  • The "IsPartOfList" element is no longer a child of "Resources" but is now placed after the resources section.
  • Elements that link to an external vocabulary in the component definition have an optional "ValueConceptLink" attribute (in the http://www.clarin.eu/cmd/1) namespace that can contain the URI of a CLAVAS vocabulary item.

CMDI manipulation and information from the profile XSD

Software that handles CMDI "natively", for example by means of DOM manipulation, needs to take into account all of the aspects mentioned above (section 'Processing incoming CMDI'). Respecting the CMDI 1.2 namespaces, i.e. http://www.clarin.eu/cmd/1 for the envelope and general attributes and the profile specific namespace for the 'Components' document payload is mandatory when generating CMDI.

Chances are that your application or service extracts information from the profile schema. In addition to the 'regular' schema information (in the XMLSchema namespace), XSD files obtained from the Component Registry also contain some CMDI specific "annotations", most of them in the http://www.clarin.eu/cmd/1 namespace. Like in CMDI 1.1, there are concept links for components, elements and attributes previously labeled as data categories in a separate namespace, now through the "ConceptLink" attribute. Depending on the profile, the schema document now may also contain URIs of a external vocabularies ('Vocabulary' attribute), value derivation instructions ('AutoValue' attribute) and cues for tools (http://www.clarin.eu/cmd/cues/1 namespace). Detailed information can be found in the last chapter of the CMDI specification.

Vocabulary URIs can be used with CLAVAS to retrieve vocabulary information in SKOS format. More information and usage examples will be made available at a later stage.

As a new feature of the CMDI toolkit, there now is a set of Schematron rules, which can be processed using a Schematron implementation to perform a number of checks on a CMDI record that go beyond schema validity but rather can be used to obtain feedback with respect to a number of best practices and common mistakes. Although not required, it is highly recommended to check documents on save or before publication as not adhering to these best practices might negatively affect the degree to which the metadata can be processed within the CLARIN infrastructure.